Outer Channel of DNA-Based Data Storage: Capacity and Efficient Coding Schemes

Xuan He, Yi Ding, Kui Cai, Guanghui Song, Bin Dai, Xiaohu Tang

Published: 2023/12/19

Abstract

In this paper, we consider the outer channel for DNA-based data storage. When transmitting over the outer channel, each DNA string is treated as a unit/symbol that would be either correctly received, or erased, or corrupted by uniformly distributed random symbol substitution errors, and all strings are randomly shuffled with each other. We first derive the capacity of the outer channel, which implies that the uniformly distributed random symbol substitution errors are only as harmful as the erasure errors (for infinite-length non-binary random linear codes with near maximum likelihood decoding). Next, we propose practically efficient coding schemes which encode the bits at the same position of different strings into a codeword. We compute the soft/hard information of each bit, which allows us to independently decode the bits within a codeword, leading to an independent decoding scheme. To improve the decoding performance, we measure the reliability of each string based on the independent decoding result, and perform a further step of decoding over the most reliable strings, leading to a joint decoding scheme. Simulations with low-density parity-check codes confirm that the joint decoding scheme can reduce the frame error rate by more than 3 orders of magnitude compared to the independent decoding scheme, and it can outperform the state-of-the-art decoding scheme in the literature across a wide range of parameter regions.

Outer Channel of DNA-Based Data Storage: Capacity and Efficient Coding Schemes | SummarXiv | SummarXiv