The Transformer Cookbook
Andy Yang, Christopher Watson, Anton Xue, Satwik Bhattamishra, Jose Llarena, William Merrill, Emile Dos Santos Ferreira, Anej Svete, David Chiang
公開日: 2025/10/1
Abstract
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability.