Stochastic Mutation as a Mechanism for the Emergence of SARS-CoV-2 New Variants

Liaofu Luo, Jun Lv

公開日: 2025/2/13

Abstract

Predicting the future evolutionary trajectory of SARS-CoV-2 remains a critical challenge, particularly due to the pivotal role of spike protein mutations. It is therefore essential to develop evolutionary models capable of continuously integrating new experimental data. In this study, we employ a cladogram algorithm that incorporates established assumptions for mutant representation -- using both four-letter and two-letter formats -- along with an n-mer distance algorithm to construct a cladogenetic tree of SARS-CoV-2 mutations. This tree accurately captures the observed changes across macro-lineages. We introduce a stochastic method for generating new strains on this tree based on spike protein mutations. For a given set A of existing mutation sites, we define a set X comprising x randomly generated mutation sites on the spike protein. The intersection of A and X, denoted as set Y, contains y sites. Our analysis indicates that the position of a generated strain on the tree is primarily determined by x. Through large-scale stochastic sampling, we predict the emergence of new macro-lineages. As x increases, the dominance among macro-lineages shifts: lineage O surpasses N, P surpasses O, and eventually Q surpasses P. We identify threshold values of x that delineate transitions between these macro-lineages. Furthermore, we propose an algorithm for predicting the timeline of macro-lineage emergence. In conclusion, our findings demonstrate that SARS-CoV-2 evolution adheres to statistical principles: the emergence of new strains can be driven by randomly generated spike protein sites, and large-scale stochastic sampling reveals evolutionary patterns underlying the rise of distinct macro-lineages.

Stochastic Mutation as a Mechanism for the Emergence of SARS-CoV-2 New Variants | SummarXiv | SummarXiv