Filter off: No filter for categories
Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. In this talk we discuss a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. In this talk we will address this issue and derive a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, this analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings, discussed in this talk, bridge the gap between established theory and the practical applicability of such debiased methods. This talk is based on joint work with Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer and Holger Rauhut.
Intracellular processes must be precisely organized in space and time. A paradigmatic example is the symmetric division of bacteria, which, in Escherichia coli, is spatially controlled by the ATP-driven oscillation of Min proteins between the cell poles. I will present the variety of reaction–diffusion patterns formed by such intracellular protein systems in vivo and in vitro. We will then discuss conceptual models and see how mass conservation of the protein species can be used to construct fully nonlinear patterns and predict their generic long-time dynamics independent of the specific mathematical form of the reaction term. We thereby uncover similarities of the reaction–diffusion patterns with phase-separating liquid mixtures and develop the concept of wavelength selection by interrupted coarsening.
The availability of diverse, high-quality data has led to tremendous advances in science, technology and society at large, when analysed by means of statistical and machine learning (ML) methods. However, real-world data, in many cases, cannot be made public to the research community due to privacy restrictions, obstructing progress, especially in bio-medical research. Synthetic data can substitute the sensitive real data, and as long as they do not disclose private aspects. This has proven to be successful in training downstream ML applications. We propose TVineSynth, a vine copula based synthetic tabular data generator, which is designed to balance privacy and utility, using the vine tree structure and its truncation to do the trade-off. Contrary to synthetic data generators that achieve differential privacy (DP) by globally adding noise, TVineSynth performs a controlled approximation of the estimated data generating distribution, so that it does not suffer from poor utility of the resulting synthetic data for downstream prediction tasks. TVineSynth introduces a targeted bias into the vine copula model that, combined with the specific tree structure of the vine, causes the model to zero out privacy-leaking dependencies while relying on those that are beneficial for utility. Privacy is here measured with membership (MIA) and attribute inference attacks (AIA). Further, we theoretically justify how the construction of TVineSynth ensures AIA privacy under a natural privacy measure for continuous sensitive attributes. When compared to competitor models, with and without DP, on simulated and on real-world data, TVineSynth achieves a superior privacy-utility balance.
Recently developed quasi-Bayesian (QB) methods proposed a stimulating change of paradigm in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, however, existing constructions for higher dimensional densities are only possible by relying on restrictive assumptions on the model's multivariate structure. In this talk, we discuss a wholly different approach to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem, by decomposing the predictive distribution into one-dimensional predictive marginals and a high-dimensional copula. We use the efficient recursive QB construction for the one-dimensional marginals and model the dependence using highly expressive vine copulas. Further, we tune hyperparameters using robust divergences (eg. energy score) and show that our proposed Quasi-Bayesian Vine (QB-Vine) is a fully non-parametric density estimator with an analytical form and convergence rate independent of the dimension of the data in some situations. Our experiments illustrate that the QB-Vine is appropriate for high dimensional distributions (64), needs very few samples to train (200), and outperforms state-of-the-art methods with analytical forms for density estimation and supervised tasks by a considerable margin.
We consider the problem of testing whether a single coefficient is equal to zero in linear models when the dimension of covariates p can be up to a constant fraction of sample size n. In this regime, an important topic is to propose tests with finite-population valid size control without requiring the noise to follow strong distributional assumptions. In this paper, we propose a new method, called residual permutation test (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever p < n/2. Moreover, RPT is shown to be asymptotically powerful for heavy-tailed noises with bounded (1+t)-th order moment when the true coefficient is at least of order n^{-t/(1+t)} for t \in [0, 1]. We further proved that this signal size requirement is essentially rate-optimal in the minimax sense. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions.
In this interdisciplinary study at the interface of finance theory and quantum theory, we consider a rational agent who at time 0 enters into a financial contract for which the payout is determined by a quantum measurement at some time T > 0. The state of the quantum system is given in the Heisenberg representation by a known density matrix p. How much will the agent be willing to pay at time 0 to enter into such a contract? In the case of a finite dimensional Hilbert space H, each such claim is represented by an observable X where the eigenvalues of X determine the amount paid if the corresponding outcome is obtained in the measurement. We use Gleason's theorem to prove, under reasonable axioms, that there exists a pricing state q which is equivalent to the physical state p such that the pricing function Π takes the linear form Π(X) = P0T tr(qX) for any claim X, where P0T is the one-period discount factor. By ‘equivalent’ we mean that p and q share the same null space: that is, for any |ξ⟩ ∈ H one has p|ξ⟩ = 0 if and only if q|ξ ⟩ = 0. We introduce a class of optimization problems and solve for the optimal contract payout structure for a claim based on a given measurement. Then we consider the implications of the Kochen–Specker theorem in this setting and we look at the problem of forming portfolios of such contracts. This work illustrates how ideas from the theory of finance can be successfully applied in a non-Kolmogorovian setting. Based on work with Leandro Sánchez-Betancourt (Oxford). The paper can be found at J. Phys. A: Math. Theor. 57 (2024) 285302.
We explain the multivariate regular variation and a tail dependence measure "extremogram and cross-extremogram", taking a bivariate GARCH model as an example. We show that the tails of the components of a bivariate GARCH(1,1) process may exhibit power law behavior but, depending on the choice of the parameters, the tail indices of the components may differ. Then, we derive asymptotic theory for the extremogram and cross-extremogram of a bivariate GARCH(1,1) process. Moreover, we discuss what GARCH models can and cannot do in terms of the tail modeling, while comparing stochastic volatility models. We also mention limitations of the current notion of multivariate regular variation and we pose several problems to be solved.
TBA
TBA
This paper proposes a novel estimator of the survival function under dependent random right censoring, a situation frequently encountered in survival analysis. We model the relation between the survival time T and the censoring C by using a parametric copula, whose association parameter is not supposed to be known. Moreover, the survival time distribution is left unspecified, while the censoring time distribution is modeled parametrically. We develop sufficient conditions under which our model for (T,C) is identifiable, and propose an estimation procedure for the distribution of the survival time T of interest. Our model and estimation procedure build further on the work on the copula-graphic estimator proposed by Zheng and Klein (1995) and Rivest and Wells (2001), which has the drawback of requiring the association parameter of the copula to be known, and on the recent work by Czado and Van Keilegom (2023), who suppose that both marginal distributions are parametric whereas we allow one margin to be unspecified. Our estimator is based on a pseudo-likelihood approach and maintains low computational complexity. The asymptotic normality of the proposed estimator is shown. Additionally, we discuss an extension to include a cure fraction, addressing both identifiability and estimation issues. The practical performance of our method is validated through extensive simulation studies and an application to a breast cancer data set.
We extend the scope of a recently introduced dependence coefficient between a scalar response Y and a multivariate covariate X to the case where X takes values in a general metric space. Particular attention is paid to the case where X is a curve. While on the population level, this extension is straight forward, the asymptotic behavior of the estimator we consider is delicate. It crucially depends on the nearest neighbor structure of the infinite-dimensional covariate sample, where deterministic bounds on the degrees of the nearest neighbor graphs available in multivariate settings do no longer exist. The main contribution of this paper is to give some insight into this matter and to advise a way how to overcome the problem for our purposes. As an important application of our results, we consider an independence test.