Wide neural networks are often described by Gaussian process limits, which capture their typical behavior at initialization and under lazy training regimes. However, these approximations fail to describe the rare but structurally significant fluctuations that govern finite-width effects, posterior concentration, and feature learning. In this talk, we present a large deviation perspective on deep neural networks that goes beyond Gaussian limits.
We first discuss recent results establishing a functional large deviation principle for fully connected networks with Gaussian initialization and Lipschitz activations, including ReLU. This provides a probabilistic description of the entire network output as a random function on compact input sets, with a rate function characterized by a recursive variational structure across layers.
We then turn to Bayesian neural networks and show how large deviation theory leads to an explicit variational characterization of the posterior over predictors. In contrast with Gaussian process theory, where the kernel is fixed, the large deviation rate function involves a joint optimization over both outputs and internal kernels, yielding a natural notion of feature learning at the functional level. Numerical experiments illustrate how this framework captures non-Gaussian effects, posterior deformation, and data-dependent kernel selection in moderately wide networks.
These results suggest that large deviations provide a principled framework to understand representation learning in wide neural networks, bridging probabilistic asymptotics with practical behavior beyond Gaussian approximations.
Based on arXiv:2601.18276 and arXiv:2602.22925.