PerQ: Efficient Evaluation of Multilingual Text Personalization Quality

Dominik Macko, Andrew Pulver

Published: 2025/9/30

Abstract

Since no metrics are available to evaluate specific aspects of a text, such as its personalization quality, the researchers often rely solely on large language models to meta-evaluate such texts. Due to internal biases of individual language models, it is recommended to use multiple of them for combined evaluation, which directly increases costs of such meta-evaluation. In this paper, a computationally efficient method for evaluation of personalization quality of a given text (generated by a language model) is introduced, called PerQ. A case study of comparison of generation capabilities of large and small language models shows the usability of the proposed metric in research, effectively reducing the waste of resources.

PerQ: Efficient Evaluation of Multilingual Text Personalization Quality | SummarXiv | SummarXiv