Jahr 2021 Jahr 2022 Jahr 2023
Modify filter

Filter is active: Show only talks in the following category : Oberseminar Statistics and Data Science.


19.01.2022 12:30 Tobias Windisch (Robert Bosch GmbH):
Learning Bayesian networks on high-dimensional manufacturing dataOnline: attend

In our manufacturing plants, many tens of thousands of components for the automotive industry, like cameras or brake boosters, are produced each day. For many of our products, thousands of quality measurements are collected and checked during their assembly process individually. Understanding the relations and interconnections between those measurements is key to obtain a high production uptime and keep scrap at a minimum. Graphical models, like Bayesian networks, provide a rich statistical framework to investigate these relationships, not alone because they represent them as a graph. However, learning their graph structure is an NP-hard problem and most existing algorithms designed to either deal with a small number of variables or a small number of observations. On our datasets, with many thousands of variables and many hundreds of thousands of observations, classic learning algorithms don’t converge. In this talk, we show how we use an adapted version of the NOTEARs algorithm that uses mixture density neural networks to learn the structure of Bayesian networks even for very high-dimensional manufacturing data.

16.02.2022 13:15 Eliana Maria Duarte Gelvez (Centro de Matemática Universidade do Porto in Porto, Portugal):
Representation of Context-Specific Causal modelsOnline: attend

I this talk I will introduce a class of discrete statistical models to represent context-specific conditional independence relations for discrete data. These models can also be represented by sequences of context-DAGs (directed acyclic graphs). We prove that two of these models are statistically equivalent if and only their contexts are equal and the context DAGs have the same skeleton and v-structures. This is a generalization of the Verma and Pearl criterion for equivalence for DAGs. This is joint work with Liam Solus. A 3 minute video abstract for this talk is available at https://youtu.be/CccVNRFmR1I .

23.02.2022 17:00 Kailun Zhu (TU Delft):
Regular vines with strongly chordal pattern of conditional independenceOnline: attend (Meeting ID: 674 7990 5618; Passcode: 139119)

In this talk the relationship between strongly chordal graphs and m-saturated vines (regular vines with certain nodes removed or assigned with independence copula) is proved. Moreover, an algorithm to construct an m-saturated vine structure corresponding to a strongly chordal graph is provided. When the underlying data is sparse this approach leads to improvements in an estimation process as compared to current heuristic methods. Furthermore due to reduction of model complexity it is possible to evaluate all vine structures as well as to fit non-simplified vines.

03.05.2022 13:00 Anna-Laura Sattelberger (Max Planck Institute for Mathematics in the Sciences, Leipzig):
Bayesian Integrals on Toric VarietiesBC1 2.02.11 (Parkring 11, 85748 Garching)

Toric varieties have a strong combinatorial flavor: those algebraic varieties are described in terms of a fan. Based on joint work with M. Borinsky, B. Sturmfels, and S. Telen (https://arxiv.org/abs/2204.06414), I explain how to understand toric varieties as probability spaces. Bayesian integrals for discrete statistical models that are parameterized by a toric variety can be computed by a tropical sampling method. Our methods are motivated by the study of Feynman integrals and positive geometries in particle physics.

10.05.2022 12:45 Marten Wegkamp (Cornell University, Ithaca, New York):
Optimal Discriminant Analysis in High-Dimensional Latent Factor ModelsOnline: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this talk, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any low-dimensional projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows, but does not require, the lower-dimension to grow with the sample size and the feature dimension exceeds the sample size. Simulations support our theoretical findings. This is joint work with Xin Bing (Department of Statistical Sciences, University of Toronto).

11.05.2022 16:15 Florentina Bunea (Cornell University):
Surprises in topic model estimation and new Wasserstein document-distance calculationsOnline: attend (Meeting-ID: 913-2473-4411; Password: StatsCol22)Seminarraum (Ludwigstrasse 33, 80333 Mathematisches Institut, LMU)

Topic models have been and continue to be an important modeling tool for an ensemble of independent multinomial samples with shared commonality. Although applications of topic models span many disciplines, the jargon used to define them stems from text analysis. In keeping with the standard terminology, one has access to a corpus of n independent documents, each utilizing words from a given dictionary of size p. One draws N words from each document and records their respective count, thereby representing the corpus as a collection of n samples from independent, p-dimensional, multinomial distributions, each having a different, document specific, true word probability vector Π. The topic model assumption is that each Π is a mixture of K discrete distributions, that are common to the corpus, with document specific mixture weights. The corpus is assumed to cover K topics, that are not directly observable, and each of the K mixture components correspond to conditional probabilities of words, given a topic. The vector of the K mixture weights, per document, is viewed as a document specific topic distribution T, and is thus expected to be sparse, as most documents will only cover a few of the K topics of the corpus.

Despite the large body of work on learning topic models, the estimation of sparse topic distributions, of unknown sparsity, especially when the mixture components are not known, and are estimated from the same corpus, is not well understood and will be the focus of this talk. We provide estimators of T, with sharp theoretical guarantees, valid in many practically relevant situations, including the scenario p >> N (short documents, sparse data) and unknown K. Moreover, the results are valid when dimensions p and K are allowed to grow with the sample sizes N and n.

When the mixture components are known, we propose MLE estimation of the sparse vector T, the analysis of which has been open until now. The surprising result, and a remarkable property of the MLE in these models, is that, under appropriate conditions, and without further regularization, it can be exactly sparse, and contain the true zero pattern of the target. When the mixture components are not known, we exhibit computationally fast and rate optimal estimators for them, and propose a quasi-MLE estimator of T, shown to retain the properties of the MLE. The practical implication of our sharp, finite-sample, rate analyses of the MLE and quasi-MLE reveal that having short documents can be compensated for, in terms of estimation precision, by having a large corpus.

Our main application is to the estimation of Wasserstein distances between document generating distributions. We propose, estimate and analyze Wasserstein distances between alternative probabilistic document representations, at the word and topic level, respectively. The effectiveness of the proposed Wasserstein distance estimates, and contrast with the more commonly used Word Mover Distance between empirical frequency estimates, is illustrated by an analysis of an IMDb movie reviews data set.

Brief Bio: Florentina Bunea obtained her Ph.D. in Statistics at the University of Washington, Seattle. She is now a Professor of Statistics in the Department of Statistics and Data Science, and she is affiliated with the Center for Applied Mathematics and the Department of Computer Science, at Cornell University. She is a fellow of the Institute of Mathematical Statistics, and she is or has been part of numerous editorial boards such as JRRS-B, JASA, Bernoulli, the Annals of Statistics. Her work has been continuously funded by the US National Science Foundation. Her most recent research interests include latent space models, topic models, and optimal transport in high dimensions.

25.05.2022 12:15 Oksana Chernova (Nationale Taras-Schewtschenko-Universität Kiew, Ukraine):
Estimation in Cox proportional hazards model with measurement errorsOnline: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

The Cox proportional hazards model is a semiparametric regression model that can be used in medical research, engineering or insurance for investigating the association between the survival time (the so-called lifetime) of an object and predictor variables. We investigate the Cox proportional hazards model for right-censored data, where the baseline hazard rate belongs to an unbounded set of nonnegative Lipschitz functions, with fixed constant, and the vector of regression parameters belongs to a compact parameter set, and in addition, the time-independent covariates are subject to measurement errors. We construct a simultaneous estimator of the baseline hazard rate and regression parameter, present asymptotic results and discuss goodness-of-fit tests.

01.06.2022 12:15 Jack Kuipers (ETH Zürich):
Efficient sampling for Bayesian networks and benchmarking their structure learning Online: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

Bayesian networks are probabilistic graphical models widely employed to understand dependencies in high-dimensional data, and even to facilitate causal discovery. Learning the underlying network structure, which is encoded as a directed acyclic graph (DAG) is highly challenging mainly due to the vast number of possible networks in combination with the acyclicity constraint, and a wide plethora of algorithms have been developed for this task. Efforts have focused on two fronts: constraint-based methods that perform conditional independence tests to exclude edges and score and search approaches which explore the DAG space with greedy or MCMC schemes. We synthesize these two fields in a novel hybrid method which reduces the complexity of Bayesian MCMC approaches to that of a constraint-based method. This enables full Bayesian model averaging for much larger Bayesian networks, and offers significant improvements in structure learning. To facilitate the benchmarking of different methods, we further present a novel automated workflow for producing scalable, reproducible, and platform-independent benchmarks of structure learning algorithms. It is interfaced via a simple config file, which makes it accessible for all users, while the code is designed in a fully modular fashion to enable researchers to contribute additional methodologies. We demonstrate the applicability of this workflow for learning Bayesian networks in typical data scenarios.

References: doi:10.1080/10618600.2021.2020127 and arXiv:2107.03863

15.06.2022 12:15 Harry Joe (University of British Columbia, CAN):
Comparison of dependence graphs based on different functions of correlation matricesOnline: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

A dependence graph for a set of variables has rules for which pairs of variables are connected. In the literature on dependence graphs for gene expression measurements, there have been several rules for connecting pairs of variables based on a correlation matrix: (a) absolute correlation of the pair exceed a threshold; (b) absolute partial correlation of the pair given the rest exceed a threshold; (c) first-order conditional independence rule of Magwene and Kim (2004).

These three methods will be compared with the dependence graph from a truncated partial correlation vine with thresholding. The comparisons are made for correlation matrices that are derived from (a) factor dependence structures, (b) Markov tree structure, and (c) variables that form groups with strong within group dependence and weaker between group dependence. If there are latent variables, the graphs are compared with and without them. The goal is to show that more parsimonious and interpretable graphs can be obtained with inclusion of latent variables.

22.06.2022 12:15 Han Li (University of Melbourne, AUS):
Joint Extremes in Temperature and Mortality: A Bivariate POT ApproachOnline: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

This research project contributes to insurance risk management by modeling extreme climate risk and extreme mortality risk in an integrated manner via extreme value theory (EVT). We conduct an empirical study using monthly temperature and death data and find that the joint extremes in cold weather and old-age death counts exhibit the strongest level of dependence. Based on the estimated bivariate generalized Pareto distribution, we quantify the extremal dependence between death counts and temperature indexes. Methodologically, we employ the bivariate peaks over threshold (POT) approach, which is readily applicable to a wide range of topics in extreme risk management.

22.06.2022 13:15 Hans Manner (University of Graz, AT):
Testing the equality of changepoints (joint with Siegfried Hörmann, TU Graz)Online: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

Testing for the presence of changepoints and determining their location is a common problem in time series analysis. Applying changepoint procedures to multivariate data results in higher power and more precise location estimates, both in online and offline detection. However, this requires that all changepoints occur at the same time. We study the problem of testing the equality of changepoint locations. One approach is to treat common breaks as a common feature and test, whether an appropriate linear combination of the data can cancel the breaks. We propose how to determine such a linear combination and derive the asymptotic distribution resulting CUSUM and MOSUM statistics. We also study the power of the test under local alternatives and provide simulation results of its nite sample performance. Finally, we suggest a clustering algorithm to group variables into clusters that are co-breaking.

06.07.2022 12:15 Anastasios Panagiotelis (University of Sydney, AUS):
Anomaly detection with kernel density estimation on manifoldsOnline: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

Manifold learning can be used to obtain a low-dimensional representation of the underlying manifold given the high-dimensional data. However, kernel density estimates of the low-dimensional embedding with a fixed bandwidth fail to account for the way manifold learning algorithms distort the geometry of the underlying Riemannian manifold. We propose a novel kernel density estimator for any manifold learning embedding by introducing the estimated Riemannian metric of the manifold as the variable bandwidth matrix for each point. The geometric information of the manifold guarantees a more accurate density estimation of the true manifold, which subsequently could be used for anomaly detection. To compare our proposed estimator with a fixed-bandwidth kernel density estimator, we run two simulations with 2-D metadata mapped into a 3-D swiss roll or twin peaks shape and a 5-D semi-hypersphere mapped in a 100-D space, and demonstrate that the proposed estimator could improve the density estimates given a good manifold learning embedding and has higher rank correlations between the true and estimated manifold density. A shiny app in R is also developed for various simulation scenarios. The proposed method is applied to density estimation in statistical manifolds of electricity usage with the Irish smart meter data. This demonstrates our estimator's capability to fix the distortion of the manifold geometry and to be further used for anomaly detection in high-dimensional data.

19.07.2022 12:15 Tobias Boege (Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig):
t.b.a.Online: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

t.b.a.

07.09.2022 12:15 Marco Scutari (Polo Universitario Lugano, Switzerland):
t.b.a.Online: attend (Parkring 11, 85748 Garching)

t.b.a.

14.09.2022 12:15 Michaël Lalancette (University of Toronto, CAN):
t.b.a.Online: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

t.b.a.

26.10.2022 12:15 Helmut Farbmacher (TUM):
t.b.a.Online: attendBC1 2.01.10 (Parkring 11, 85748 Garching)

t.b.a.