Memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE

Michael C. Thrun, Julian Märte

公開日: 2025/9/10

Abstract

We present memshare\footnote{The Software package is published as a CRAN package under https://CRAN.R-project.org/package=memshare, a package that enables shared memory multicore computation in R by allocating buffers in C++ shared memory and exposing them to R through ALTREP views. We compare memshare to SharedObject (Bioconductor) discuss semantics and safety, and report a 2x speedup over SharedObject with no additional resident memory in a column wise apply benchmark. Finally, we illustrate a downstream analytics use case: feature selection by mutual information in which densities are estimated per feature via Pareto Density Estimation (PDE). The analytical use-case is an RNA seq dataset consisting of N=10,446 cases and d=19,637 gene expressions requiring roughly n_threads * 10GB of memory in the case of using parallel R sessions. Such and larger use-cases are common in big data analytics and make R feel limiting sometimes which is mitigated by the addition of the library presented in this work.