Filter off: No filter for categories
SEMINAR IS CANCELLED DUE TO SPEAKER ILLNESS!
Quasi-Newton (QN) methods are popular iterative algorithms known for their superior practical performance compared to Gradient Descent (GD)-type methods. However, the existing theoretical results for this class of algorithms do not sufficiently justify their advantage over GD-type methods. In this talk, we discuss our recent efforts to address this issue. Specifically, in the strongly convex setting, we propose the first “globally” convergent QN method that achieves an explicit “non-asymptotic superlinear” rate. We show that the rate presented for our method is provably faster than GD after at most $O(d)$ iterations, where $d$ is the problem dimension. Additionally, in the convex setting, we present an accelerated variant of our proposed method that provably outperforms the accelerated gradient method and converges at a rate of \(O(\min\{1/k^2, \sqrt{d \log k }/ k^{2.5}\})\), where $k$ is the number of iterations. To attain these results, we diverge from conventional approaches and construct our QN methods based on the Hybrid Proximal Extragradient (HPE) framework and its accelerated variants. Furthermore, a pivotal algorithmic concept underpinning our methodologies is an online learning framework for updating the Hessian approximation matrices. Specifically, we relate our method's convergence rate to the regret of a specific online convex optimization problem in the matrix space and choose the sequence of Hessian approximation matrices to minimize its overall regret.
Based on the following papers:
- COLT 2023: https://arxiv.org/pdf/2302.08580.pdf - NeurIPS 2023: https://arxiv.org/pdf/2306.02212.pdf
BIO: Aryan Mokhtari is an Assistant Professor in the Electrical and Computer Engineering (ECE) Department at the University of Texas at Austin (UT Austin), where he holds the title of Fellow of Texas Instruments/Kilby. Prior to joining UT Austin, he served as a Postdoctoral Associate in the Laboratory for Information and Decision Systems (LIDS) at MIT. Before that, he worked as a Research Fellow at the Simons Institute for the program on “Bridging Continuous and Discrete Optimization.” He received his Ph.D. in electrical and systems engineering from the University of Pennsylvania (Penn). Aryan has received the NSF CAREER Award, Army Research Office (ARO) Early Career Program Award, Google Research Scholar Award, UT Austin ECE Department’s Junior Faculty Excellence in Teaching Award, the Simons-Berkeley Research Fellowship, and Penn’s Joseph and Rosaline Wolf Award for Best Doctoral Dissertation.
A scaling law refers to the observation that the test performance of a model improves as the number of training data increases. A fast scaling law implies that one can solve machine learning problems by simply boosting the data and the model sizes. Yet, in many cases, the benefit of adding more data can be negligible. In this work, we study the rate of scaling laws of nearest neighbor classifiers. We show that a scaling law can have two phases: in the first phase, the generalization error depends polynomially on the data dimension and decreases fast; whereas in the second phase, the error depends exponentially on the data dimension and decreases slowly. Our analysis highlights the complexity of the data distribution in determining the generalization error. When the data distributes benignly, our result suggests that nearest neighbor classifier can achieve a generalization error that depends polynomially, instead of exponentially, on the data dimension.
BIO: Jingzhao Zhang is an Assistant Professor at Tsinghua, IIIS. He graduated in 2022 from MIT EECS PhD program under the supervision of Prof. Ali Jadbabaie and Prof. Suvrit Sra. His research focused on providing theoretical analyses to practical large-scale algorithms. He now aims to propose theory that are simple and can predict experiment observations. Jingzhao Zhang is also interested in machine learning applications, specifically those involving dynamical system formulations. He received Ernst A. Guillemin SM Thesis Award and George M. Sprowls PhD Thesis Award.
In this seminar, we will discuss the estimation of group action using the non-commutative Fourier transform. We'll explore an interesting relationship between the non-commutative Fourier transform and this estimation problem. Next, we'll optimize our estimation method by leveraging this connection under various conditions, including energy constraints. Finally, we'll apply the obtained results to uncertainty relations related to compact groups. Throughout this discussion, we'll highlight the significance of several special functions in this topic.
Abstract We consider the fundamental problem of Linear Quadratic Gaussian Control on an infinite horizon with costly information acquisition. Specifically, we consider a two-dimensional coupled system, where one of the two states is observable, and the other is not. Additionally, to inference from the observable state, costly information is available via an additional, controlled observation process. Mathematically, the Kalman-Bucy filter is used to Markovianize the problem. Using an ansatz, the problem is then reduced to one of the control-dependent, conditional variance for which we show regularity of the value function. Using this regularity for the reduced problem together with the ansatz to solve the problem by dynamic programming and verification and construct the unique optimal control. We analyze the optimal control, the optimally controlled state and the value function and compare various properties to the literature of problems with costly information acquisition. Further, we show existence and uniqueness of an equilibrium for the controlled, conditional variance, and study sensitivity of the control problem at the equilibrium. At last, we compare the problem to the case of no costly information acquisition and fully observable states. Joint work with Christoph Knochenhauer and Yufei Zhang (Imperial College London).
When attempting to construct QFTs that include Fermions using the methods of Stochastic Quantisation, one is naturally forced to consider noncommutative stochastic PDEs. I shall give an introduction to noncommutative probability and show how to formulate SPDEs driven by noncommutative noises. Furthermore, I will describe how one can renormalise the singular products appearing in such equations using a noncommutative generalisation of Regularity Structures with a particular focus on Fermionic equations. This talk will be based on joint work with Ajay Chandra and Martin Hairer.
Establishing regularity of transition probabilities is a standard focus for solutions to stochastic differential equations (SDEs). For diffusions in finite-dimensional spaces, the Hormander ``bracket generating'' condition for an SDE is a standard geometric assumption that ensures smoothness of the solution. The Hormander condition also often induces a natural geometry on the space which is tied to the analysis of the diffusion.
The situation in infinite dimensions is more complicated and less understood. We'll consider a class of infinite dimensional spaces where we propose a different but equivalent analytic formulation of the Hormander condition. Under this assumption, we discuss the related geometry and establish some regularity properties of the associated diffusion.
It is well-known that the cohomology ring has a richer structure than homology groups. However, until recently, the use of cohomology in the persistence setting has been limited to speeding up barcode computations. Some of the recently introduced invariants, namely, persistent cup-length, persistent cup modules and persistent Steenrod modules, to some extent, fill this gap. When added to the standard persistence barcode, they lead to invariants that are more discriminative than the standard persistence barcode. In this work, we devise an O(dn^4) algorithm for computing the persistent k-cup modules for all k∈{2,…,d}, where d denotes the dimension of the filtered complex, and n denotes its size. Moreover, we note that since the persistent cup length can be obtained as a byproduct of our computations, this leads to a faster algorithm for computing it for d ≥ 3. Finally, we introduce a new stable invariant called partition modules of cup product that is more discriminative than persistent k-cup modules and devise an O(c(d)n^4) algorithm for computing it, where c(d) is subexponential in d. The talk is based on joint work with Tamal Dey.
To attend remotely please contact Alexander Rolle (https://alexanderrolle.github.io/).
Cluster synchronization plays an important role in the proper functioning of many real-world network systems. The mechanism behind the cluster’s emergence and transformation in response to parameter change, especially in heterogeneous networks that lack symmetry, is crucial to the general understanding of collective behavior in realistic large network systems and yet has remained out of reach by existing approaches. We uncover a mechanism for cluster synchronization in heterogeneous networks by developing a heterogeneous mean field approach along with a self-consistent theory to analyze the cluster structure and its stability.
When learning a causal model of a system, a key motivation is the use of that model for downstream decision-making. In this talk, I will take a decision-centric perspective on causal structure learning, focused on a simple setting that is amenable to careful statistical analysis. In particular, we study causal effect estimation via covariate adjustment, when the causal graph is unknown, all variables are discrete, and the non-descendants of treatment are given. \[ \] We propose an algorithm which searches for a data-dependent "approximate" adjustment set via conditional independence testing, and analyze the bias-variance tradeoff entailed by this procedure. We prove matching upper and lower bounds on omitted confounding bias in terms of small violations of conditional independence. Further, we provide a finite-sample bound on the complexity of correctly selecting an "approximate" adjustment set and of estimating the resulting adjustment functional, using results from the property testing literature. \[ \] We demonstrate our algorithm on synthetic and real-world data, outperforming methods which ignore structure learning or which perform structure learning separately from causal effect estimation. I conclude with some open questions at the intersection of structure learning and causal effect estimation.
In this talk I will present some new insights on so-called Poisson representable processes, a general class of {0,1}-valued processes recently introduced by Forsström, Gantert and Steif. Particularly, I will discuss a new characteristic of these in terms of certain mixing properties. As an application thereof, I will argue that the upper invariant measure of the contact process on Z^d is not Poisson representable, thereby answering a question raised in the above mentioned work. This relies on the upper invariant measure satisfying certain directional mixing properties, but not their spatial equivalent. Moreover, the general approach extends to other processes having similar properties, such as the plus phase of the Ising model on Z^2 in the phase transition regime.
Twenty years ago or so, Roya Mohayaee, Yann Brenier, Uriel Frisch and some co-authors designed an algorithm based on optimal transport for reconstructing the early Universe. Meanwhile, Yann Brenier proposed a modified theory of gravitation also based on optimal transport (Monge-Ampère gravitation) and built a system of finitely many independent Brownian particles as a bridge between the early and today discretized Universes, once some temperature parameter tends down to zero.
It turns out that Brenier's model can be recasted into a Schrödinger problem based on the limit of the empirical measure of Brownian particles experiencing a mean field quantum interaction, when the number of particles tends to infinity and both the temperature (of the order of Boltzmann constant) and the quantum coupling (of order of Planck's constant) tend down to zero. This a joint work with Roya Mohayaee (Institut d'astrophysique de Paris).
In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency.
NN
Biochemical reaction networks, e.g. of metabolic type, may comprise hundreds of species and reactions. On the other hand, reactions are typically not expressed in an elementary form: their rates are consequently not in mass-action form but they involve more parameters as e.g. Michaelis-Menten kinetics.
In this talk, I present an approach to understand the possible range dynamics of such networks, which exploits the parameter richness of the reaction rates. At a fixed equilibrium, we rescale the partial derivatives appearing in the Jacobian to highlight change of stabilities and bifurcations, with consequent nearby dynamics. In essence, we identify fast `leading’ subnetworks that drives the dynamics into an unstable region, causing the occurrence of specific bifurcations.
This talk is concerned with the following epidemic process: A set of nodes is partitioned into three states: susceptible, infectious, and recovered. We start with a single infectious node. Proceeding in rounds whose length is antiproportional to the population size, a fixed amount of nodes are drawn independently at random. If at least one of the selected nodes is infectious, every susceptible node in the sample becomes infected. Moreover, any infectious vertex recovers independently at a constant rate. If the expected amount of infections caused by single node is less than one, the epidemic dies out quickly and leaves almost the entire population untouched. If it is above one, either the infection dies out quickly or a large outbreak occurs, during which a non-vanishing fraction of the population is affected. Moreover, if enough nodes are infectious at the same time, the system’s behaviour is essentially deterministic.
We study Blaschke-Santaló diagrams for the inradius, circumradius and diameter in general Minkowski spaces. Independent of the gauge, they can be described by at most five parts of boundaries. We analyse which bodies fill these bounds and for which gauges they become redundant. Furthermore, we give a complete description of the union over all these diagrams with respect to planar symmetric gauges (solving an open problem stated by Brandenberg and Gonzáles Merino in a recent paper) by providing a new inequality that tightens Bohnenblust's bound. This union is equal to the union over all diagrams with respect to intersections of triangles with their origin reflection. This is joint work with René Brandenberg and Bernardo Gonzáles Merino.