Synthetic generation of online social networks through homophily
Alejandro Buitrago López, Javier Pastor-Galindo, José A. Ruipérez-Valiente
公開日: 2025/9/2
Abstract
Online social networks (OSNs) have become increasingly relevant for studying social behavior and information diffusion. Nevertheless, they are limited by restricted access to real OSN data due to privacy, legal, and platform-related constraints. In response, synthetic social networks serve as a viable approach to support controlled experimentation, but current generators reproduce only topology and overlook attribute-driven homophily and semantic realism. This work proposes a homophily-based algorithm that produces synthetic microblogging social networks such as X. The model creates a social graph for a given number of users, integrating semantic affinity among user attributes, stochastic variation in link formation, triadic closure to foster clustering, and long-range connections to ensure global reachability. A systematic grid search is used to calibrate five hyperparameters (affinity strength, noise, closure probability, distant link probability, and candidate pool size) for reaching five structural values observed in real social networks (density, clustering coefficient, LCC proportion, normalized shortest path, and modularity). The framework is validated by generating synthetic OSNs at four scales (10^3-10^6 nodes), and benchmarking them against a real-world Bluesky network comprising 4 million users. Comparative results show that the framework reliably reproduces the structural properties of the real network. Overall, the framework outperforms leading importance-sampling techniques applied to the same baseline. The generated graphs capture topological realism and yield attribute-driven communities that align with sociological expectations, providing a realistic, scalable testbed that liberates social researchers from relying on live digital platforms.