Semi-Supervised Radiomics for Glioblastoma IDH Mutation: Limited Labels, Data Sensitivity, and SHAP Interpretation

Amir Hossein Pouria, Sajad Jabarzadeh Ghandilu, Shahram Taeb, Somayeh Sadat Mehrnia, Mehrdad Oveisi, Arman Rahmim, Mohammad R. Salmanpour

公開日: 2025/9/30

Abstract

Glioblastoma (GBM) is an aggressive brain tumor in which IDH mutation status is a key prognostic biomarker, but traditional testing requires invasive biopsies, emphasizing the need for non-invasive approaches. In this multi-center study, we analyzed MRI sequences (T1, T2-weighted, contrast-enhanced T1, and FLAIR) from 1,329 patients across eight centers, with IDH labels available for 1,061 cases. A total of 1,223 radiomic features per case were extracted using PyRadiomics with Laplacian of Gaussian and wavelet filters, and both supervised learning (SL) and semi-supervised learning (SSL) frameworks were applied, incorporating 38 feature selection/attribute extraction strategies and 24 classifiers. Five-fold cross-validation was performed on UCSF-PDGM and UPENN datasets, and external validation on IvyGAP, TCGA-LGG, and TCGA-GBM, with SHAP analysis conducted for feature interpretability. Multimodal MRI fusion (T1+T2+T1CE+FLAIR) consistently outperformed single-sequence models. The best SSL model (RFE + SVM) achieved 0.93 cross-validation and 0.75 external accuracy, while the top SL model (RFE + Complement Naive Bayes) reached 0.90 and 0.80, respectively. SSL further demonstrated greater robustness to limited sample sizes, maintaining stable performance compared to SL, and SHAP analysis highlighted the amplified role of first-order Root Mean Square (T1CE) and wavelet-based features, strengthening biomarker interpretability. These findings indicate that SSL enhances accuracy, stability, and interpretability in MRI-based IDH prediction, with multimodal MRI fusion providing the most scalable and reliable strategy for non-invasive biomarker discovery in GBM.

全文を読む (arXiv.org)