Many datasets for modern machine learning consist of high dimensional observations that are generated from some low dimensional latent variables. While recent advances in deep learning allow us to sample from distributions of almost arbitrary complexity, the recovery of the ground truth latent variable is still challenging even in simple settings. We study this problem through the lens of identifiability, i.e., when can we, at least theoretically, hope to recover the latent structure up to certain symmetries? We will present a general identifiability result for interventional data and a contrastive algorithm to find the latent variables. In the second part, we study the robustness of identifiability results to misspecification as one challenge for practical applications of representation learning. This talk is based on joined work with Goutham Rajendran, Elan Rosenfeld, Bryon Aragam, Bernhard Schölkopf, and Pradeep Ravikumar.
Bio: Simon Buchholz received his PhD in mathematics from the University of Bonn where he was advised by Stefan Mueller. Currently he is a Postdoctoral Researcher with Bernhard Schölkopf in the department for Empirical Inference at the Max Planck Institute for Intelligent Systems in Tübingen where he works on problems in causal representation learning.