TMT: A Simple Way to Translate Topic Models Using Dictionaries
Felix Engl, Andreas Henrich
公開日: 2025/8/31
Abstract
The training of topic models for a multilingual environment is a challenging task, requiring the use of sophisticated algorithms, topic-aligned corpora, and manual evaluation. These difficulties are further exacerbated when the developer lacks knowledge of the target language or is working in an environment with limited data, where only small or unusable multilingual corpora are available. Considering these challenges, we introduce Topic Model Translation (TMT), a novel, robust and transparent technique designed to transfer topic models (e.g., Latent Dirichlet Allocation (LDA) based topic models) from one language to another, without the need for metadata, embeddings, or aligned corpora. TMT enables the reuse of topic models across languages, making it especially suitable for scenarios where large corpora in the target language are unavailable or manual translation is infeasible. Furthermore, we evaluate TMT extensively using both quantitative and qualitative methods, demonstrating that it produces semantically coherent and consistent topic translations.