PP-STAT: An Efficient Privacy-Preserving Statistical Analysis Framework using Homomorphic Encryption
Hyunmin Choi
Published: 2025/8/16
Abstract
With the widespread adoption of cloud computing, the need for outsourcing statistical analysis to third-party platforms is growing rapidly. However, handling sensitive data such as medical records and financial information in cloud environments raises serious privacy concerns. In this paper, we present PP-STAT, a novel and efficient Homomorphic Encryption (HE)-based framework for privacy-preserving statistical analysis. HE enables computations to be performed directly on encrypted data without revealing the underlying plaintext. PP-STAT supports advanced statistical measures, including Z-score normalization, skewness, kurtosis, coefficient of variation, and Pearson correlation coefficient, all computed securely over encrypted data. To improve efficiency, PP-STAT introduces two key optimizations: (1) a Chebyshev-based approximation strategy for initializing inverse square root operations, and (2) a pre-normalization scaling technique that reduces multiplicative depth by folding constant scaling factors into mean and variance computations. These techniques significantly lower computational overhead and minimize the number of expensive bootstrapping procedures. Our evaluation on real-world datasets demonstrates that PP-STAT achieves high numerical accuracy, with mean relative error (MRE) below 2.4x10-4. Notably, the encrypted Pearson correlation coefficient between the smoker attribute and charges reaches 0.7873, with an MRE of 2.86x10-4. These results confirm the practical utility of PP-STAT for secure and precise statistical analysis in privacy-sensitive domains.