Designing Psychometric Bias Measures for ChatBots: An Application to Racial Bias Measurement

Mouhacine Benosman

Published: 2025/8/17

Abstract

Artificial intelligence (AI), particularly in the form of large language models (LLMs) or chatbots, has become increasingly integrated into our daily lives. In the past five years, several LLMs have been introduced, including ChatGPT by OpenAI, Claude by Anthropic, and Llama by Meta, among others. These models have the potential to be employed across a wide range of human-machine interaction applications, such as chatbots for information retrieval, assistance in corporate hiring decisions, college admissions, financial loan approvals, parole determinations, and even in medical fields like psychotherapy delivered through chatbots. The key question is whether these chatbots will interact with humans in a bias-free manner or if they will further reinforce the existing pathological biases present in human-to-human interactions. If the latter is true, then how to rigorously measure these biases? We aim to address this challenge by proposing a principled framework for designing psychometric measures to evaluate chatbot biases.

Read Full Paper (arXiv.org)