MindVote: When AI Meets the Wild West of Social Media Opinion

Xutao Mao, Ezra Xuanru Tao, Leyao Wang

Published: 2025/5/20

Abstract

Large Language Models (LLMs) are increasingly used as scalable tools for pilot testing, predicting public opinion distributions before deploying costly surveys. To serve as effective pilot testing tools, the performance of these LLMs is typically benchmarked against their ability to reproduce the outcomes of past structured surveys. This evaluation paradigm, however, is misaligned with the dynamic, context-rich social media environments where public opinion is increasingly formed and expressed. By design, surveys strip away the social, cultural, and temporal context that shapes public opinion, and LLM benchmarks built on this paradigm inherit these critical limitations. To bridge this gap, we introduce MindVote, the first benchmark for public opinion distribution prediction grounded in authentic social media discourse. MindVote is constructed from 3,918 naturalistic polls sourced from Reddit and Weibo, spanning 23 topics and enriched with detailed annotations for platform, topical, and temporal context. Using this benchmark, we conduct a comprehensive evaluation of 15 LLMs. MindVote provides a robust, ecologically valid framework to move beyond survey-based evaluations and advance the development of more socially intelligent AI systems.

Read Full Paper (arXiv.org)