The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters

Shosei Sakaguchi

Published: 2025/8/17

Abstract

This paper investigates the identification power gained by combining experimental data in which treatment is randomized with observational data in which treatment is self-selected for distributional treatment effect (DTE) parameters. While experimental data identify average treatment effects, many DTE parameters such as the distribution of individual treatment effects are only partially identified. We examine whether and how combining these two data sources tightens the identified set for such parameters. For broad classes of DTE parameters, we derive sharp bounds under the combined data and clarify the mechanism through which data combination improves identification relative to using experimental data alone. Our analysis highlights that the self-selection in observational data is a key source of identification power. We establish necessary and sufficient conditions under which the combined data shrink the identified set, showing that such shrinkage generally occurs unless selection-on-observables holds in the observational data. We also propose a linear programming approach to compute sharp bounds that can incorporate additional structural restrictions, such as positive dependence between potential outcomes and the generalized Roy model. An empirical application using data on negative campaign advertisements in the U.S. presidential election illustrates the practical relevance of the proposed approach.

Read Full Paper (arXiv.org)