On the role of the design phase in a linear regression

Junho Choi

公開日: 2025/9/2

Abstract

The "design phase" refers to a stage in observational studies, during which a researcher constructs a subsample that achieves a better balance in covariate distributions between the treated and untreated units. In this paper, we study the role of this preliminary phase in the context of linear regression, offering a justification for its utility. To that end, we first formalize the design phase as a process of estimand adjustment via selecting a subsample. Then, we show that covariate balance of a subsample is indeed a justifiable criterion for guiding the selection: it informs on the maximum degree of model misspecification that can be allowed for a subsample, when a researcher wishes to restrict the bias of the estimand for the parameter of interest within a target level of precision. In this sense, the pursuit of a balanced subsample in the design phase is interpreted as identifying an estimand that is less susceptible to bias in the presence of model misspecification. Also, we demonstrate that covariate imbalance can serve as a sensitivity measure in regression analysis, and illustrate how it can structure a communication between a researcher and the readers of her report.