From Next Token Prediction to (STRIPS) World Models -- Preliminary Results

Carlos Núñez-Molina, Vicenç Gómez, Hector Geffner

Published: 2025/9/16

Abstract

We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.

Read Full Paper (arXiv.org)