Minimum Sample Size Calculation for Multivariable Regression of Continuous Outcomes in Chemometrics for Astrobiology and Planetary Science

M. Konstantinidis, E. A. Lalla, S. J. Gonzalez, J. Manrique, G. Lopez-Reyes, A. Barlow, E. Sawyers, B. Barrios, M. G. Daly

公開日: 2025/9/17

Abstract

Over the last few decades, prediction models have become a fundamental tool in statistics, chemometrics, and related fields. However, to ensure that such models have high value, the inferences that they generate must be reliable. In this regard, the internal validity of a prediction model might be threatened if it is not calibrated with a sufficiently large sample size, as problems such as overfitting may occur. Such situations would be highly problematic in many fields, including space science, as the resulting inferences from prediction models often inform scientific inquiry about planetary bodies such as Mars. Therefore, to better inform the development of prediction models, we applied a theory-based guidance from the biomedical domain for establishing what the minimum sample size is under a range of conditions for continuous outcomes. This study aims to disseminate existing research criteria in biomedical research to a broader audience, specifically focusing on their potential applicability and utility within the field of chemometrics. As such, the paper emphasizes the importance of interdisciplinarity, bridging the gap between the medical domain and chemometrics. Lastly, we provide several examples of work in the context of space science. This work will be the foundation for more evidence-based model development and ensure rigorous predictive modelling in the search for life and possible habitable environments.