AutoGMM: Automatic Gaussian Mixture Modeling in Python

Tingshan Liu, Thomas L. Athey, Benjamin D. Pedigo, Joshua T. Vogelstein

Published: 2019/9/6

Abstract

The exponential growth of complex data demands fully automatic clustering. Gaussian mixture models (GMMs) provide uncertainty-aware grouping but often require expertise to specify hyperparameters, e.g., component count and covariance structure. While mclust (R) automates this via Bayesian Information Criterion (BIC), Python lacks a comparable tool. We introduce AutoGMM, an open-source Python package automating GMM via strategic initialization using an agglomerative Mahalanobis heuristic, and parallelized model selection by information criteria. AutoGMM is a drop-in tool that yields strong out-of-the-box performance on classic benchmarks, targeted stress tests, and two real datasets, with favorable runtime scaling. The code is available at https://github.com/neurodata/AutoGMM with tests and reproducible workflows.

AutoGMM: Automatic Gaussian Mixture Modeling in Python | SummarXiv | SummarXiv