Large language models surpass domain-specific architectures for antepartum electronic fetal monitoring analysis

Sheng Wong, Ravi Shankar, Beth Albert, Gabriel Davis Jones

Published: 2025/9/9

Abstract

Foundation models (FMs) and large language models (LLMs) demonstrate remarkable capabilities across diverse domains through training on massive datasets. These models have demonstrated exceptional performance in healthcare applications, yet their potential for electronic fetal monitoring (EFM)/cardiotocography (CTG) analysis, a critical technology for evaluating fetal well-being, remains largely underexplored. Antepartum CTG interpretation presents unique challenges due to the complex nature of fetal heart rate (FHR) patterns and uterine activity, requiring sophisticated analysis of long time-series data. The assessment of CTG is heavily based on subjective clinical interpretation, often leading to variability in diagnostic accuracy and deviation from timely pregnancy care. This study presents the first comprehensive comparison of state-of-the-art AI approaches for automated antepartum CTG analysis. We systematically compare time-series FMs and LLMs against established CTG-specific architectures. Our evaluation encompasses over 500 CTG recordings of varying durations reflecting real-world clinical recordings, providing robust performance benchmarks across different modelling paradigms. Our results demonstrate that fine-tuned LLMs achieve superior performance compared to both foundation models and domain-specific approaches, offering a promising alternative pathway for clinical CTG interpretation. These findings provide critical insights into the relative strengths of different AI methodologies for fetal monitoring applications and establish a foundation for future clinical AI development in prenatal care.

Read Full Paper (arXiv.org)