Simplified News in Many Languages (SNIML) is a collection of news articles that have been published by six simplified news portals. As we describe in our paper, we have prepared the collected articles as an XML corpus.

The corpus can be downloaded from this page. It contains simplified news articles in Finnish, French, Italian, Swedish, English and German, ranging from November 2003 to March 2023. All articles in the corpus are available under an open license that permits academic research use.

This corpus was created as part of a project of the Department of Computational Linguistics at the University of Zurich.

Overview of Sources

News Portal URL Language License
The Times in Plain English en-US “may be distributed and reproduced by all”
Informazione Facile it-IT CC BY-SA 4.0
Journal Essentiel fr-BE CC BY-SA 4.0
Infoeasy fr-BE CC BY-NC-ND 4.0
Selkosanomat fi CC BY-NC-ND 4.0
Lätta Bladet sv-SE CC BY-NC-ND 4.0

We would like to thank the editors of the news portals used for making their articles available.


