Co-SOM: Co-training for photometric redshift estimation using Self-Organizing Maps

Alvaro Callejas-Tavera, Erik Molino-Minero-Re, Octavio Valenzuela

公開日: 2025/9/29

Abstract

The upcoming galaxy large-scale surveys, such as the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST), will generate photometry for billions of galaxies. The interpretation of large-scale weak lensing maps, as well as the estimation of galaxy clustering, requires reliable redshifts with high precision for multi-band photometry. However, obtaining spectroscopy for billions of galaxies is impractical and complex; therefore, having a sufficiently large number of galaxies with spectroscopic observations to train supervised algorithms for accurate redshift estimation is a significant challenge and an open research area. We propose a novel methodology called Co-SOM, based on Co-training and Self-Organizing Maps (SOM), integrating labeled (sources with spectroscopic redshifts) and unlabeled (sources with photometric observations only) data during the training process, through a selection method based on map topology (connectivity structure of the SOM lattice) to leverage the limited spectroscopy available for photo-z estimation. We utilized the magnitudes and colors of Sloan Digital Sky Survey data release 18 (SDSS-DR18) to analyze and evaluate the performance, varying the proportion of labeled data and adjusting the training parameters. For training sets of 1% of labeled data ($\approx 20{,}000$ galaxies) we achieved a performance of bias $\Delta z = 0.00007 \pm 0.00022$, precision $\sigma_{zp} = 0.00063 \pm 0.00032$, and outlier fraction $f_{\mathrm{out}} = 0.02083 \pm 0.00027$. Additionally, we conducted experiments varying the volume of labeled data, and the bias remains below $10^{-3}$, regardless of the size of the spectroscopic or photometric data. These low-redshift results demonstrate the potential of semi-supervised learning to address spectroscopic limitations in future photometric surveys.