An Agentic Toolkit for Adaptive Information Extraction from Regulatory Documents

Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst

公開日: 2025/9/15

Abstract

Declaration of Performance (DoP) documents, mandated by EU regulation, certify the performance of construction products. While some of their content is standardized, DoPs vary widely in layout, language, schema, and format, posing challenges for automated key-value pair extraction (KVP) and question answering (QA). Existing static or LLM-only IE pipelines often hallucinate and fail to adapt to this structural diversity. Our domain-specific, stateful agentic system addresses these challenges through a planner-executor-responder architecture. The system infers user intent, detects document modality, and orchestrates tools dynamically for robust, traceable reasoning while avoiding tool misuse or execution loops. Evaluation on a curated DoP dataset demonstrates improved robustness across formats and languages, offering a scalable solution for structured data extraction in regulated workflows.

An Agentic Toolkit for Adaptive Information Extraction from Regulatory Documents | SummarXiv | SummarXiv