Pontryagin-Guided Deep Policy Learning for Constrained Dynamic Portfolio Choice

Jeonggyu Huh, Jaegi Jeon, Hyeng Keun Koo, Byung Hwa Lim

公開日: 2025/1/22

Abstract

We present a Pontryagin-Guided Direct Policy Optimization (PG-DPO) framework for \emph{constrained} continuous-time portfolio--consumption problems that scales to hundreds of assets. The method couples neural policies with Pontryagin's Maximum Principle and enforces feasibility via a lightweight log-barrier stagewise solve; a \emph{manifold-projection} variant (P--PGDPO) projects controls onto the PMP/KKT manifold using stabilized adjoints. We prove a barrier--KKT correspondence with $O(\epsilon)$ policy error and $O(\epsilon^2)$ instantaneous Hamiltonian gap, and extend the BPTT--PMP match to constrained settings. On short-sale (nonnegativity, floating cash) and wealth-proportional consumption caps, P--PGDPO reduces risky-weight errors by orders of magnitude versus baseline PG-DPO, while the one-dimensional consumption control shows smaller but consistent gains near binding caps. The approach remains effective when closed-form benchmarks are unavailable, and is readily extensible to transaction costs and interacting limits -- promising even greater benefits under time-varying investment opportunities where classical solutions are scarce.

Pontryagin-Guided Deep Policy Learning for Constrained Dynamic Portfolio Choice | SummarXiv | SummarXiv