AVR: Active Vision-Driven Precise Robot Manipulation with Viewpoint and Focal Length Optimization

Yushan Liu, Shilong Mu, Xintao Chao, Zizhen Li, Yao Mu, Tianxing Chen, Shoujie Li, Chuqiao Lyu, Xiao-Ping Zhang, Wenbo Ding

公開日: 2025/3/3

Abstract

Robotic manipulation in complex scenes demands precise perception of task-relevant details, yet fixed or suboptimal viewpoints often impair fine-grained perception and induce occlusions, constraining imitation-learned policies. We present AVR (Active Vision-driven Robotics), a bimanual teleoperation and learning framework that unifies head-tracked viewpoint control (HMD-to-2-DoF gimbal) with motorized optical zoom to keep targets centered at an appropriate scale during data collection and deployment. In simulation, an AVR plugin augments RoboTwin demonstrations by emulating active vision (ROI-conditioned viewpoint change, aspect-ratio-preserving crops with explicit zoom ratios, and super-resolution), yielding 5-17% gains in task success across diverse manipulations. On our real-world platform, AVR improves success on most tasks, with over 25% gains compared to the static-view baseline, and extended studies further demonstrate robustness under occlusion, clutter, and lighting disturbances, as well as generalization to unseen environments and objects. These results pave the way for future robotic precision manipulation methods in the pursuit of human-level dexterity and precision.

AVR: Active Vision-Driven Precise Robot Manipulation with Viewpoint and Focal Length Optimization | SummarXiv | SummarXiv