A gradient boosting and broadband approach to finding Lyman-α emitting galaxies beyond narrowband surveys
A. Vale, A. Paulino-Afonso, A. Humphrey, P. A. C. Cunha, B. Ribeiro, B. Cerqueira, R. Carvajal, J. Fonseca
公開日: 2025/9/26
Abstract
In this work, we test whether gradient-boosting algorithms, trained on broadband photometric data from traditional Lyman-$\alpha$ emitting (LAE) surveys, can efficiently and accurately identify LAE candidates from typical star-forming galaxies at similar redshifts and brightness levels. Using galaxy samples at $z \in [2,6]$ derived from the COSMOS2020 and SC4K catalogs, we trained gradient-boosting machine-learning algorithms (LGBM, XGBoost, and CatBoost) using optical and near-infrared broadband photometry. To ensure balanced performance, the models were trained on carefully selected datasets with similar redshift and i-band magnitude distributions. Additionally, the models were tested for robustness by perturbing the photometric data using the associated observational uncertainties. Our classification models achieved F1-scores of $\sim 87\%$ and successfully identified about $7,000$ objects with an unanimous agreement across all models. This more than doubles the number of LAEs identified in the COSMOS field compared with the SC4K dataset. We managed to spectroscopically confirm 60 of these LAE candidates using the publicly available catalogs in the COSMOS field. These results highlight the potential of machine learning in efficiently identifying LAEs candidates. This lays the foundations for applications to larger photometric surveys, such as Euclid and LSST. By complementing traditional approaches and providing robust preselection capabilities, our models facilitate the analysis of these objects. This is crucial to increase our knowledge of the overall LAE population.