Modelling Under-Reported Data: Pitfalls of Naïve Approaches and a New Statistical Framework for Epidemic Curve Reconstruction

Justin J. Slater, Sindi Bebeziqi

公開日: 2025/9/12

Abstract

Count-valued autoregressions are widely used to analyse time-series of reported infectious-disease cases because of their close connection with discrete-time transmission models. However, when such models are applied directly to under-reported case counts, their mechanistic interpretation can break down. We establish new theoretical results quantifying the consequences of ignoring under-reporting in these models. To address this issue, reported cases are often modelled as a binomially thinned version of an underlying count process, but such models are difficult to fit because the unobserved true counts are serially correlated and integer-valued. We develop a new statistical framework for under-reported infectious-disease data that uses a normal-normal approximation to a broad class of thinned count autoregressions and then accurately maps this continuous process back to the integers. Through simulations and applications to rotavirus incidence in a German state and Covid-19 incidence in English conurbations, we demonstrate that our approach both retains the mechanistic appeal of thinned autoregressions and substantially simplifies inference.

Modelling Under-Reported Data: Pitfalls of Naïve Approaches and a New Statistical Framework for Epidemic Curve Reconstruction | SummarXiv | SummarXiv