07.05.2026 14:00 Andrea Agazzi:
Clustering dynamics in mean-field models of transformersOnline: attend (Passcode: 929409)MI 02.08.020 (Boltzmannstr. 3, 85748 Garching)

We consider a family of mean-field interacting particle systems modeling the layerwise evolution of information (represented as a set of “tokens”) in transformers, a common architecture used in deep learning. In this setting, tokens are interpreted as particles on the d-dimensional sphere, and their distribution evolves according to a Vlasov-type equation, where time plays the role of network depth. Numerical experiments reveal the tendency of these particle systems to organize into clustered/synchronized states, offering a potential explanation for how meaning emerges in these models. In this talk, I will introduce both deterministic and stochastic variants of these models and provide a rigorous characterization of this phenomenon.