Persistent-DPO: A novel loss function and hybrid learning for generative quantum eigensolver

Junya Nakamura, Shinichiro Sanji

Published: 2025/9/10

Abstract

We study the generative quantum eigensolver (GQE)~\cite{nakaji2024generative}, which trains a classical generative model to produce quantum circuits with desired properties such as describing molecular ground states. We introduce two methods to improve GQE. First, we identify a limitation of direct preference optimization (DPO) when used as the loss function in GQE, and propose Persistent-DPO (P-DPO) as a solution to this limitation. Second, as a method to improve the online learning during the training phase of GQE, we introduce a hybrid approach that combines online and offline learning. Using a transformer decoder implementation of GQE, we evaluate our methods through ground state search experiments on the $\mathrm{BeH_2^{}}$ molecule and observe that P-DPO achieves lower energies than DPO. The hybrid approach further improves convergence and final energy values, particularly with P-DPO.

Read Full Paper (arXiv.org)