From Pattern Recognition to Reasoning: A Survey for Transformer Length Generalization in Algorithmic Tasks
byD. Verșebeniuc
supervised byA. Härmä
Abstract
This work surveys recent advances in enhancing the generalization capabilities of Transformer models in algorithmic tasks, marking a shift from pattern recognition to algorithmic reasoning. We introduce a comprehensive taxonomy that differentiates between length generalization, the ability to extend fixed computational procedures to longer inputs, and compositional generalization, where models dynamically assemble learned subroutines to solve more complex problems. Our review covers key techniques that improve Transformer performance on arithmetic tasks, such as integer addition, by leveraging innovative data formatting strategies (including reversed formats, index hints, random space augmentation, and zero-padding) alongside advanced positional encoding methods (e.g., absolute, additive relative, position coupling, and randomized encodings). We provide a comparative analysis showing length-extension factors ranging from 1x to 10x across these methods on the integer addition task, and identify positional encoding choice, together with task-aligned data formatting, as the dominant factor governing extrapolation beyond the training range. At the end, this work outlines the present challenges and highlights possible future directions for research and development.