Bayesian networks are probabilistic graphical models widely employed to understand dependencies in high-dimensional data, and even to facilitate causal discovery. Learning the underlying network structure, which is encoded as a directed acyclic graph (DAG) is highly challenging mainly due to the vast number of possible networks in combination with the acyclicity constraint, and a wide plethora of algorithms have been developed for this task. Efforts have focused on two fronts: constraint-based methods that perform conditional independence tests to exclude edges and score and search approaches which explore the DAG space with greedy or MCMC schemes. We synthesize these two fields in a novel hybrid method which reduces the complexity of Bayesian MCMC approaches to that of a constraint-based method. This enables full Bayesian model averaging for much larger Bayesian networks, and offers significant improvements in structure learning. To facilitate the benchmarking of different methods, we further present a novel automated workflow for producing scalable, reproducible, and platform-independent benchmarks of structure learning algorithms. It is interfaced via a simple config file, which makes it accessible for all users, while the code is designed in a fully modular fashion to enable researchers to contribute additional methodologies. We demonstrate the applicability of this workflow for learning Bayesian networks in typical data scenarios.
References: doi:10.1080/10618600.2021.2020127 and arXiv:2107.03863