A Methodological Study on Data Representation for Machine Learning Modelling of Thermal Conductivity of Rare-Earth Oxides
Amiya Chowdhury, Acacio Rincón Romero, Eduardo Aguilar-Bejarano, Halar Memon, Grazziela Figueredo, Tanvir Hussain
公開日: 2025/9/23
Abstract
Quantitative structure-activity relationship (QSAR) modelling is widely employed in materials sci- ence to predict properties of interest and extract useful descriptors for measured properties. In thermal barrier coatings (TBC), QSAR can significantly shorten the experimental discovery cycle, which can take years. Although machine learning methods are commonly employed for QSAR, their performance depends on the data quality and how instances are represented. Traditional, hand-crafted descriptors based on known material properties are limited to represent materials that share the same basic crystal structure, limited the size of the dataset. By contrast, graph neural networks offer a more expressive representation, encoding atomic positions and bonds in the crystal lattice. In this study, we compare Random Forest (RF) and Gaussian Process (GP) models trained on hand-crafted descriptors from the literature with graph-based representations for high-entropy, rare-earth pyrochlore oxides using the Crystal Graph Convolutional Neural Network (CGCNN). Two different types of augmentation methods are also explored to account for the limited data size, one of which is only applicable to graph-based representations. Our findings show that the CGCNN model substantially outperforms the RF and GP models, underscoring the potential of graph-based representations for enhanced QSAR modelling in TBC research.