Data Complexity: a threshold between Classical and Quantum Machine Learning -- Part I

Christophe Pere

Published: 2025/9/19

Abstract

Quantum machine learning (QML) holds promise for accelerating pattern recognition, optimization, and data analysis, but the conditions under which it can truly outperform classical approaches remain unclear. Existing research often emphasizes algorithms and hardware, while the role of data itself in determining quantum advantage has received less attention. We argue that data complexity -- the structural, statistical, algorithmic, and topological richness of datasets -- is central to defining these conditions. Beyond qubit counts or circuit depth, the real bottleneck lies in the cost of embedding, representing, and generalizing from data. In this paper (Part I of a two-part series), we review classical and quantum metrics of data complexity, including entropy, correlations, compressibility, and topological invariants such as persistent homology and topological entanglement entropy. We also examine their implications for trainability, scalability, and error tolerance in QML. Part II will develop a unified framework and provide empirical benchmarks across datasets, linking these complexity measures to practical performance.

Read Full Paper (arXiv.org)