JOINT: Join Optimization and Inference via Network Traversal

Szu-Yun Ko, Ethan Chen, Bo-Cian Chang, Alan Shu-Luen Chang

公開日: 2025/9/8

Abstract

Traditional relational databases require users to manually specify join keys and assume exact matches between column names and values. In practice, this limits joinability across fragmented or inconsistently named tables. We propose a fuzzy join framework that automatically identifies joinable column pairs and traverses indirect (multi-hop) join paths across multiple databases. Our method combines column name similarity with row-level fuzzy value overlap, computes edge weights using negative log-transformed Jaccard scores, and performs join path discovery via graph traversal. Experiments on synthetic healthcare-style databases demonstrate the system's ability to recover valid joins despite fuzzified column names and partial value mismatches. This research has direct applications in data integration.

JOINT: Join Optimization and Inference via Network Traversal | SummarXiv | SummarXiv