Descriptor and Graph-based Molecular Representations in Prediction of Copolymer Properties Using Machine Learning
Elaheh Kazemi-Khasragh, Rocío Mercado, Carlos Gonzalez, Maciej Haranczyk
Published: 2025/9/15
Abstract
Copolymers are highly versatile materials with a vast range of possible chemical compositions. By using computational methods for property prediction, the design of copolymers can be accelerated, allowing for the prioritization of candidates with favorable properties. In this study, we utilized two distinct representations of molecular ensembles to predict the seven different physical polymer properties copolymers using machine learning: we used a random forest (RF) model to predict polymer properties from molecular descriptors, and a graph neural network (GNN) to predict the same properties from 2D polymer graphs under both a single- and multi-task setting. To train and evaluate the models, we constructed a data set from molecular dynamic simulations for 140 binary copolymers with varying monomer compositions and configurations. Our results demonstrate that descriptors-based RFs excel at predicting density and specific heat capacities at constant pressure (Cp) and volume (Cv) because these properties are strongly tied to specific molecular features captured by molecular descriptors. In contrast, graph representations better predict expansion coefficients ({\gamma}, {\alpha}) and bulk modulus (K), which depend more on complex structural interactions better captured by graph-based models. This study underscores the importance of choosing appropriate representations for predicting molecular properties. Our findings demonstrate how machine learning models can expedite copolymer discovery with learnable structure-property relationships, streamlining polymer design and advancing the development of high-performance materials for diverse applications.