More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

Moritz Wolter, Lokesh Veeramacheneni, Charles Tapley Hoyt

公開日: 2025/2/2

Abstract

While experimental reproduction remains a pillar of the scientific method, we observe that the software best practices supporting the reproduction of machine learning ( ML ) research are often undervalued or overlooked, leading both to poor reproducibility and damage to trust in the ML community. We quantify these concerns by surveying the usage of software best practices in software repositories associated with publications at major ML conferences and journals such as NeurIPS, ICML, ICLR, TMLR, and MLOSS within the last decade. We report the results of this survey that identify areas where software best practices are lacking and areas with potential for growth in the ML community. Finally, we discuss the implications and present concrete recommendations on how we, as a community, can improve reproducibility in ML research.

全文を読む (arXiv.org)