Extracting Moore Machines from Transformers using Queries and Counterexamples

Rik Adriaensen, Jaron Maene

Published: 2024/10/8

Abstract

Fuelled by the popularity of the transformer architecture in deep learning, several works have investigated what formal languages a transformer can learn from data. Nonetheless, existing results remain hard to compare due to methodological differences. To address this, we construct finite state automata as high-level abstractions of transformers trained on regular languages using queries and counterexamples. Concretely, we extract Moore machines, as many training tasks used in literature can be mapped onto them. We demonstrate the usefulness of this approach by studying positive-only learning and the sequence accuracy measure in detail.

Read Full Paper (arXiv.org)