Intrinsic Signal Models Defined by the High-Dimensional, Small-Sample Limit

Yoh-ichi Mototake, Y-h. Taguchi

Published: 2023/4/13

Abstract

The detection of a signal variable from multiple variables that contain many noise variables is often approached as a variable selection problem under a given objective variable. This is nothing more than building a supervised model of a signal by specifying the signal as the objective variable. On the other hand, such a supervised model does not work effectively under high-dimensional and small-sample-size conditions, as the estimation of model parameters becomes indeterminate. We propose an ``intrinsic signal model'' that enables signal detection under high-dimensional and small-sample-size conditions without external signal definitions. The proposed intrinsic signal model is based on the assumption that the datasets in this world are generated from a certain dynamical system, and variables generated from dynamical systems with small correlation lengths are considered noisy variables. That is, the variables that maintain the data structure generated from a dynamical system under high-dimensional and small-sample-size conditions, corresponding to the limit of a sample size of 0, are modeled as always signal variables. In this study, we showed that with such a signal model, the Taguchi method provides an effective way of detecting signals. The proposed signal model was validated by generating a dataset with a globally coupled map system, which is a high-dimensional dynamical system. Furthermore, we validated the model with Gene Expression Data which are not explicitly generated from a dynamical system; as a result, we observed a signal structure consistent with that of the signal model proposed in this study. The results suggest that the proposed signal model is valid for a wide range of datasets.