Modeling Insider Filing Delays in Financial Markets with an Interpretable XGBoost Framework
Cheng Huang, Yao Ma, Fan Gao, Yutong Liu, Yadi Liu, Xiaoli Ma, Ye Aung Moe, Yuhan Zhang, Weizheng Xie, Zeyu Han, Xiangxiang Wang, Hao Wang, Yongbin Yu
Published: 2025/7/27
Abstract
Timely disclosure of insider transactions is a cornerstone of market transparency, yet delays in filing remain widespread and challenging to monitor at scale. This study introduces a comprehensive insider filing delay dataset spanning more than four million Form 4 transactions from 2002 to 2025, enriched with annotations on insider roles, governance attributes, and firm-level indicators. Building on these data, we present a hybrid framework that integrates a state-space encoder with an XGBoost classifier to capture temporal trading patterns while retaining interpretability essential for regulatory auditing. The framework consistently outperforms statistical models, deep sequence learners, and large language model baselines, achieving balanced gains in precision, recall, and F1-score. Feature ablation analyses highlight the predictive importance of insider history, spatiotemporal factors, and governance signals, shedding light on the behavioral drivers of both minor oversights and systematic violations. Beyond accuracy, the dataset and framework establish a reproducible benchmark for studying disclosure compliance, offering regulators and researchers transparent tools to strengthen market integrity.