Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

Yiwen Guan, Jacob Whitehill

公開日: 2025/9/22

Abstract

Multilingual translation faces challenges of computational redundancy and limited accuracy for low-resource languages, especially in speech translation. To address this, we propose a novel hierarchical Transformer Encoder Tree (TET) combined with non-autoregressive encoder-only models trained with Connectionist Temporal Classification for multilingual translation. By sharing intermediate representations among linguistically similar target languages, TET can improve accuracy on low-resource languages, reduce computational redundancy, and allow generating all target languages in a single forward pass, thus eliminating sequential bottlenecks and improving parallelism. For speech translation, combining TET with a non-autoregressive speech recognition backbone (wav2vec2) shows promising results in terms of translation quality compared to autoregressive systems while being 7-14 times faster.

Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation | SummarXiv | SummarXiv