Nonparametric Clustering Stopping Rule Based on Multivariate Median

Hend Gabr, Brian H Willis, Mohammed Baragilly

公開日: 2024/10/24

Abstract

This paper introduces a novel nonparametric criterion for determining the appropriate number of clusters, which is derived from the spatial median. The method is constructed to reconcile two competing objectives of cluster analysis: the preservation of internal homogeneity within clusters and the maximization of heterogeneity across clusters. To this end, the proposed algorithm optimizes the ratio of inter-cluster to intra-cluster variability, incorporating adjustments for both the sample size and the number of clusters. Unlike conventional techniques, the method is distribution-free and demonstrates robustness in the presence of outliers. Its properties were first examined through extensive simulation studies, followed by empirical evaluations on three applied datasets. To further assess comparative performance, the proposed procedure was benchmarked against 13 established algorithms for cluster number determination. In 11 of these comparisons, the proposed criterion exhibited superior performance, thereby underscoring its utility as a reliable and rigorous alternative for multivariate clustering applications.