Scarce Data, Noisy Inferences, and Overfitting: The Hidden Flaws in Ecological Dynamics Modelling
Mario Castro, Rafael Vida, Javier Galeano, José A. Cuesta
Published: 2025/10/4
Abstract
Metagenomic data has significantly advanced microbiome research by employing ecological models, particularly in personalised medicine. The generalised Lotka-Volterra (gLV) model is commonly used to understand microbial interactions and predict ecosystem dynamics. However, gLV models often fail to capture complex interactions, especially when data is limited or noisy. This study critically assesses the effectiveness of gLV and similar models using Bayesian inference and a model reduction method based on information theory. We found that ecological data often leads to non-interpretability and overfitting due to limited information, noisy data, and parameter sloppiness. Our results highlight the need for simpler models that align with the available data and propose a distribution-based approach to better capture ecosystem diversity, stability, and competition. These findings challenge current bottom-up ecological modelling practices and aim to shift the focus toward a Statistical Mechanics view of ecology based on distributions of parameters.