The Chonkers Algorithm: Content-Defined Chunking with Strict Guarantees on Size and Locality

Benjamin Berger

公開日: 2025/9/14

Abstract

This paper presents the Chonkers algorithm, a novel content-defined chunking method providing simultaneous strict guarantees on chunk size and edit locality. Unlike existing algorithms such as Rabin fingerprinting and anchor-based methods, Chonkers achieves bounded propagation of edits and precise control over chunk sizes. I describe the algorithm's layered structure, theoretical guarantees, implementation considerations, and introduce the Yarn datatype, a deduplicated, merge-tree-based string representation benefiting from Chonkers' strict guarantees.

The Chonkers Algorithm: Content-Defined Chunking with Strict Guarantees on Size and Locality | SummarXiv | SummarXiv