Mean Shift for Clustering Functional Data: A Scalable Algorithm and Convergence Analysis
Ting-Li Chen, Toshinari Morimoto, Su-Yun Huang, Ruey S. Tsay
公開日: 2025/7/19
Abstract
This paper extends the mean shift algorithm from vector-valued data to functional data, enabling effective clustering in infinite-dimensional settings. To address the computational challenges posed by large-scale datasets, we introduce a fast stochastic variant that significantly reduces computational complexity. We provide a rigorous analysis of convergence for the full functional mean shift procedure, establishing theoretical guarantees for its behavior. For the stochastic variant, we provide some partial justification for its use by showing that it approximates the full algorithm well when the subset size is sufficiently large. The proposed method is validated both through simulation studies and through real-data analysis, including hourly Taiwan PM$_{2.5}$ measurements and Argo oceanographic profiles. Our key contributions include: (1) a novel extension of the mean shift algorithm to functional data for clustering without the need to specify the number of clusters; (2) convergence analysis of the full functional mean shift algorithm in Hilbert space; (3) a scalable stochastic variant based on random partitioning, with partial theoretical justification; and (4) real-data applications demonstrating the method's scalability and practical usefulness.