Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy

Yangfan He, Jianhui Wang, Yijin Wang, Yan Zhong, Xinyuan Song, Junjiang Lin, Xinhang Yuan, Jingqun Tang, Yi Xin, Hao Zhang, Yuchen Li, Zijian Zhang, Hongyang He, Tianxiang Xu, Miao Zhang, Kuan Lu, Menghao Huo, Keqin Li, Jiaqi Chen, Tianyu Shi, Jianyuan Ni

Published: 2025/1/25

Abstract

Current image generation systems produce high-quality images but struggle with ambiguous user prompts, making interpretation of actual user intentions difficult. Many users must modify their prompts several times to ensure the generated images meet their expectations. While some methods focus on enhancing prompts to make the generated images fit user needs, the model is still hard to understand users' real needs, especially for non-expert users. In this research, we aim to enhance the visual parameter-tuning process, making the model user-friendly for individuals without specialized knowledge and better understand user needs. We propose a human-machine co-adaption strategy using mutual information between the user's prompts and the pictures under modification as the optimizing target to make the system better adapt to user needs. We find that an improved model can reduce the necessity for multiple rounds of adjustments. We also collect multi-round dialogue datasets with prompts and images pairs and user intent. Various experiments demonstrate the effectiveness of the proposed method in our proposed dataset. Our dataset and annotation tools will be available.

Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy | SummarXiv | SummarXiv