Ordered Leaf Attachment (OLA) Vectors can Identify Reticulation Events even in Multifurcated Trees
Alexey Markin, Tavis K. Anderson
Published: 2025/9/19
Abstract
Recently, a new vector encoding, Ordered Leaf Attachment (OLA), was introduced that represents $n$-leaf phylogenetic trees as $n-1$ length integer vectors by recording the placement location of each leaf. Both encoding and decoding of trees run in linear time and depend on a fixed ordering of the leaves. Here, we investigate the connection between OLA vectors and the maximum acyclic agreement forest (MAAF) problem. A MAAF represents an optimal breakdown of $k$ trees into reticulation-free subtrees, with the roots of these subtrees representing reticulation events. We introduce a corrected OLA distance index over OLA vectors of $k$ trees, which is easily computable in linear time. We prove that the corrected OLA distance corresponds to the size of a MAAF, given an optimal leaf ordering that minimizes that distance. Additionally, a MAAF can be easily reconstructed from optimal OLA vectors. We expand these results to multifurcated trees: we introduce an $O(kn \cdot m\log m)$ algorithm that optimally resolves a set of multifurcated trees given a leaf-ordering, where $m$ is the size of a largest multifurcation, and show that trees resolved via this algorithm also minimize the size of a MAAF. These results suggest a new approach to fast computation of phylogenetic networks and identification of reticulation events via random permutations of leaves. Additionally, in the case of microbial evolution, a natural ordering of leaves is often given by the sample collection date, which means that under mild assumptions, reticulation events can be identified in polynomial time on such datasets.