Label-Efficient Grasp Joint Prediction with Point-JEPA

Jed Guzelkabaagac, Boris Petrović

Published: 2025/9/13

Abstract

We investigate whether 3D self-supervised pretraining with a Joint-Embedding Predictive Architecture (Point-JEPA) enables label-efficient grasp joint-angle prediction. Using point clouds tokenized from meshes and a ShapeNet-pretrained Point-JEPA encoder, we train a lightweight multi-hypothesis head with winner-takes-all and evaluate by top-logit selection. On DLR-Hand II with object-level splits, Point-JEPA reduces RMSE by up to 26% in low-label regimes and reaches parity with full supervision. These results suggest JEPA-style pretraining is a practical approach for data-efficient grasp learning.

Label-Efficient Grasp Joint Prediction with Point-JEPA | SummarXiv | SummarXiv