TreeGPT: A Novel Hybrid Architecture for Abstract Syntax Tree Processing with Global Parent-Child Aggregation
Zixi Li
公開日: 2025/9/6
Abstract
We introduce TreeGPT, a novel neural architecture that combines transformer-based attention mechanisms with global parent-child aggregation for processing Abstract Syntax Trees (ASTs) in neural program synthesis tasks. Unlike traditional approaches that rely solely on sequential processing or graph neural networks, TreeGPT employs a hybrid design that leverages both self-attention for capturing local dependencies and a specialized Tree Feed-Forward Network (TreeFFN) for modeling hierarchical tree structures through iterative message passing. The core innovation lies in our Global Parent-Child Aggregation mechanism, formalized as: $$h_i^{(t+1)} = \sigma \Big( h_i^{(0)} + W_{pc} \sum_{(p,c) \in E_i} f(h_p^{(t)}, h_c^{(t)}) + b \Big)$$ where $h_i^{(t)}$ represents the hidden state of node $i$ at iteration $t$, $E_i$ denotes all parent-child edges involving node $i$, and $f(h_p, h_c)$ is an edge aggregation function. This formulation enables each node to progressively aggregate information from the entire tree structure through $T$ iterations. Our architecture integrates optional enhancements including gated aggregation with learnable edge weights, residual connections for gradient stability, and bidirectional propagation for capturing both bottom-up and top-down dependencies. We evaluate TreeGPT on the ARC Prize 2025 dataset, a challenging visual reasoning benchmark requiring abstract pattern recognition and rule inference. Experimental results demonstrate that TreeGPT achieves 96\% accuracy, significantly outperforming transformer baselines (1.3\%), large-scale models like Grok-4 (15.9\%), and specialized program synthesis methods like SOAR (52\%) while using only 1.5M parameters. Our comprehensive ablation study reveals that edge projection is the most critical component, with the combination of edge projection and gating achieving optimal performance.