Risk-Sensitive Option Market Making with Arbitrage-Free eSSVI Surfaces: A Constrained RL and Stochastic Control Bridge
Jian'an Zhang
Published: 2025/10/6
Abstract
We formulate option market making as a constrained, risk-sensitive control problem that unifies execution, hedging, and arbitrage-free implied-volatility surfaces inside a single learning loop. A fully differentiable eSSVI layer enforces static no-arbitrage conditions (butterfly and calendar) while the policy controls half-spreads, hedge intensity, and structured surface deformations (state-dependent rho-shift and psi-scale). Executions are intensity-driven and respond monotonically to spreads and relative mispricing; tail risk is shaped with a differentiable CVaR objective via the Rockafellar--Uryasev program. We provide theory for (i) grid-consistency and rates for butterfly/calendar surrogates, (ii) a primal--dual grounding of a learnable dual action acting as a state-dependent Lagrange multiplier, (iii) differentiable CVaR estimators with mixed pathwise and likelihood-ratio gradients and epi-convergence to the nonsmooth objective, (iv) an eSSVI wing-growth bound aligned with Lee's moment constraints, and (v) policy-gradient validity under smooth surrogates. In simulation (Heston fallback; ABIDES-ready), the agent attains positive adjusted P\&L on most intraday segments while keeping calendar violations at numerical zero and butterfly violations at the numerical floor; ex-post tails remain realistic and can be tuned through the CVaR weight. The five control heads admit clear economic semantics and analytic sensitivities, yielding a white-box learner that unifies pricing consistency and execution control in a reproducible pipeline.