The super learner for time-to-event outcomes: A tutorial

Ruth H. Keogh, Karla Diaz-Ordaz, Nan van Geloven, Jon Michael Gran, Kamaryn T. Tanner

公開日: 2025/9/3

Abstract

Estimating risks or survival probabilities conditional on individual characteristics based on censored time-to-event data is a commonly faced task. This may be for the purpose of developing a prediction model or may be part of a wider estimation procedure, such as in causal inference. A challenge is that it is impossible to know at the outset which of a set of candidate models will provide the best predictions. The super learner is a powerful approach for finding the best model or combination of models ('ensemble') among a pre-specified set of candidate models or 'learners', which can include parametric and machine learning models. Super learners for time-to-event outcomes have been developed, but the literature is technical and a reader may find it challenging to gather together the full details of how these methods work and can be implemented. In this paper we provide a practical tutorial on super learner methods for time-to-event outcomes. An overview of the general steps involved in the super learner is given, followed by details of three specific implementations for time-to-event outcomes. We cover discrete-time and continuous-time versions of the super learner, as described by Polley and van der Laan (2011), Westling et al. (2023) and Munch and Gerds (2024). We compare the properties of the methods and provide information on how they can be implemented in R. The methods are illustrated using an open access data set and R code is provided.