Outlier-Robust Multi-Group Gaussian Mixture Modeling with Flexible Group Reassignment

Patricia Puchhammer, Ines Wilms, Peter Filzmoser

公開日: 2025/4/3

Abstract

Do expert-defined or diagnostically-labeled data groups align with clusters inferred through statistical modeling? If not, where do discrepancies between predefined labels and model-based groupings occur and why? In this work, we show how to address these questions using the multi-group Gaussian mixture model (MG-GMM). This novel model incorporates prior group information while allowing flexibility to reassign observations to alternative groups based on data-driven evidence. We achieve this by modeling the observations of each group as arising not from a single distribution, but from a Gaussian mixture comprising all group-specific distributions. Moreover, our model offers robustness against cellwise outliers that may obscure or distort the underlying group structure. We propose a new penalized likelihood approach, called cellMG-GMM, to jointly estimate mixture probabilities, location and scale parameters of the MG-GMM, and detect outliers through a penalty term on the number of flagged cellwise outliers in the objective function. We show that our estimator has good breakdown properties in presence of cellwise outliers. We develop a computationally-efficient EM-based algorithm for cellMG-GMM, and demonstrate its strong performance in identifying and diagnosing observations at the intersection of multiple groups through simulations and diverse applications in meteorology, medicine and oenology.

Outlier-Robust Multi-Group Gaussian Mixture Modeling with Flexible Group Reassignment | SummarXiv | SummarXiv