Applications of the Vendi score in genomic epidemiology

Bjarke Frost Nielsen, Amey P. Pasarkar, Qiqi Yang, Bryan T. Grenfell, Adji Bousso Dieng

公開日: 2025/9/26

Abstract

The Vendi score (VS), a diversity metric recently conceived in the context of machine learning, with applications in a wide range of fields, has a few distinct advantages over the metrics commonly used in ecology. It is classification-independent, incorporates abundance information, and has a tunable sensitivity to rare/abundant types. Using rich COVID-19 sequence data as a paradigm, we develop methods for applying the VS to time-resolved sequence data. We show how the VS allows for characterization of the overall diversity of circulating viruses and for discernment of emerging variants prior to formal identification. Furthermore, applying the VS to phylogenetic trees provides a convenient overview of within-clade diversity which can aid viral variant detection.