A World in Print: Introducing a Danish-Norwegian corpus of historical newspapers
Johan Heinsen, Camilla Bøgeskov
公開日: 2025/9/2
Abstract
This Data Descriptor introduces the dataset Enevaeldens Nyheder Online (News during Absolutism Online). The Enevaeldens Nyheder Online (ENO) dataset provides a reconstruction of the contents of major newspapers in Denmark and Norway during the period of Absolutism (1660-1849). The dataset contains approx. 474 million words, created using neural networks designed to process digitised microfilm versions of Danish newspapers as well as a smaller selection of Norwegian publications that were all hitherto illegible for computers. The contributions details this process and its results, including a way to derive standalone texts from the editions, and the accompanying BERT-model trained on a beta-version of the dataset.