Scale two-sample testing with arbitrarily missing data

Yijin Zeng, Niall M. Adams, Dean A. Bodenham

Published: 2025/9/24

Abstract

This work proposes a novel rank-based scale two-sample testing method for univariate, distinct data when a subset of the data may be missing. Our approach is based on mathematically tight bounds of the Ansari-Bradley test statistic in the presence of missing data, and rejects the null hypothesis if the test statistic is significant regardless of the missing values. This proposed scale testing method is then combined with the location testing method proposed by Zeng et al. (2024) using the Holm-Bonferroni correction for location-scale testing. We show that our methods control the Type I error regardless of the values of the missing data. Simulation results demonstrate that our methods have good statistical power, typically when less than 10% of the data are missing, while other missing data methods, such as case deletion or imputation methods, fail to control the Type I error when the data are missing not at random. We illustrate the proposed location-scale testing method on hepatitis C virus dataset where a subset of values is unobserved.

Read Full Paper (arXiv.org)