bioSBM: a random graph model to integrate epigenomic data in chromatin structure prediction
Alex Chen Yi Zhang, Angelo Rosa, Guido Sanguinetti
Published: 2024/9/22
Abstract
The spatial organization of chromatin within the nucleus plays a crucial role in gene expression and genome function. However, the quantitative relationship between this organization and nuclear biochemical processes remains under debate. In this study, we present a graph-based generative model, bioSBM, designed to capture long-range chromatin interaction patterns from Hi-C data and, importantly, simultaneously link these patterns to biochemical features. Applying bioSBM to Hi-C maps of the GM12878 lymphoblastoid cell line, we identified a latent structure of chromatin interactions, revealing 7 distinct communities that strongly align with known biological annotations. Additionally, we infer a linear transformation that maps biochemical observables, such as histone marks, to the parameters of the generative graph model, enabling accurate genome-wide predictions of chromatin contact maps on out-of-sample data, both within the same cell line, and on the completely unseen HCT116 cell line under RAD21 depletion. These findings highlight bioSBM's potential as a powerful tool for elucidating the relationship between biochemistry and chromatin architecture and predicting long-range genome organization from independent biochemical data.