Bayesian Model Comparison and Significance: Widespread Errors and how to Correct Them

Daniel P. Thorngren, David K. Sing, Sagnick Mukherjee

Published: 2025/9/30

Abstract

Bayes factors have become a popular tool in exoplanet spectroscopy for testing atmosphere models against one another. We show that the commonly used method for converting these values into significance "sigmas" is invalid. The formula is neither justified nor recommended by its original paper, and overestimates the confidence of results. We use simple examples to demonstrate the invalidity and prior sensitivity of this approach. We review the standard Bayesian interpretation of the Bayes factor as an odds ratio and recommend its use in conjunction with the Akaike Information Criterion (AIC) or Bayesian Predictive Information Criterion Simplified (BPICS) in future analyses (Python implementations are included) . As a concrete example, we refit the WASP-39 b NIRSpec transmission spectrum to test for the presence of SO$_2$. The prevalent, incorrect significance calculation gives $3.67\sigma$ whereas the standard Bayesian interpretation yields a null model probability $p(\mathcal{B}|y)=0.0044$. Surveying the exoplanet atmosphere literature, we find widespread use of the erroneous formula. In order to avoid overstating observational results and estimating observation times too low, the community should return to the standard Bayesian interpretation.

Read Full Paper (arXiv.org)