Fuzzy simplicial sets and their application to geometric data analysis
Lukas Silvester Barth, Hannaneh Fahimi, Parvaneh Joharinad, Jürgen Jost, Janis Keck, Thomas Jan Mikhail
Published: 2024/6/17
Abstract
In this article, we expand upon the concepts introduced by David Spivak about the relationship between the category $\mathbf{UM}$ of uber metric spaces and the category $\mathbf{sFuz}$ of fuzzy simplicial sets. We show that fuzzy simplicial sets can be regarded as natural combinatorial generalizations of metric relations. Furthermore, we take inspiration from UMAP to apply the theory to manifold learning, dimension reduction and data visualization, while refining some of their constructions. We generalize the adjunction between $\mathbf{UM}$ and $\mathbf{sFuz}$, derive an explicit description of colimits in $\mathbf{UM}$, and show that $\mathbf{UM}$ can be embedded into $\mathbf{sFuz}$. Furthermore, we prove analogous results for the category of extended-pseudo metric spaces $\mathbf{EPMet}$. We also provide rigorous definitions of functors that make it possible to recursively merge sets of fuzzy simplicial sets and provide a description of the adjunctions between the category of truncated fuzzy simplicial sets and $\mathbf{sFuz}$, which we relate to persistent homology. Combining those constructions, we can show a surprising connection between the well-known dimension reduction methods UMAP and Isomap and derive an alternative algorithm, which we call IsUMap, that combines some of the strengths of both methods. Additionally, we developed a new embedding method that allows to preserve clusters detected in the original metric space that we construct from the data. The visualization of the optimization process gives the user information, both about the inner-cluster distributions in the original metric space and their inter-cluster relations. We compare our new method with UMAP, Isomap and t-SNE on a series of low- and high-dimensional datasets and provide explanations for observed differences and improvements.