Can Large Language Models Correctly Interpret Equations with Errors?

Lachlan McGinness, Peter Baumgartner

Published: 2025/5/16

Abstract

This paper explores the potential of Large Language Models to accurately extract and translate equations from typed student responses into a standard format. This is a useful task as standardized equations can be graded reliably using a Computer Algebra System or a Satisfiability Modulo Theories solver. Therefore physics instructors interested in automated grading would not need to rely on the mathematical reasoning capabilities of Language Models. We used two novel frameworks to improve the translations. The first is consensus where a pair of models verify the correctness of the translations. The second is a neuro-symbolic LLM-modulo approach were models receive feedback from an automated reasoning tool. We performed experiments using responses to the Australian Physics Olympaid exam. We report on results, finding that no open-source model was able to translate the student responses at the desired level of accuracy. Future work could involve breaking the task into smaller components before parsing to improve performance, or generalizing the experiments to translate hand-written responses.

Can Large Language Models Correctly Interpret Equations with Errors? | SummarXiv | SummarXiv