On this website you can read news articles in simplified language. The articles have originally been written in different languages. We have collected the articles and translated them to German and English using automatic machine translation.
The articles were originally published by other news sites. The original source of an article is stated below the title. The newest articles are always on top of the site below the title "This week".
We collect the articles from the following websites:
This application automatically collect news articles that are published by six simplified news portals. We prepare the collected articles as an XML corpus. The Simplified News in Many Languages (SNIML) corpus can be downloaded from this website. It contains over 13,400 simplified news articles in Finnish, French, Italian, Swedish, English and German. All articles in the corpus are available under an open license that permits academic research use.
In addition, sub-corpora containing only the articles published in one language can be downloaded. Furthermore, a sub-corpus for each month is provided. We plan to publish a new version of the corpus containing the latest articles on a monthly basis.
This corpus was created as part of a project of the Department of Computational Linguistics at the University of Zurich.
News Portal | URL | Language | License | Number of Articles |
---|---|---|---|---|
The Times in Plain English | https://www.thetimesinplainenglish.com/ | en-US | "may be distributed and reproduced by all" | 1,897 |
Informazione Facile | https://informazionefacile.it/ | it-IT | CC BY-SA 4.0 | 2,686 |
Journal Essentiel | https://journalessentiel.be/ | fr-BE | CC BY-SA 4.0 | 2,723 |
Infoeasy | https://infoeasy-news.ch/ | fr-BE | CC BY-NC-ND 4.0 | 147 |
Selkosanomat | https://selkosanomat.fi/ | fi | CC BY-NC-ND 4.0 | 3,379 |
Lätta Bladet | https://lattabladet.fi/ | sv-SE | CC BY-NC-ND 4.0 | 2,559 |
We would like to thank the editors of the news portals used for making their articles available.
@inproceedings{hauser-etal-2022-multilingual, title = "A Multilingual Simplified Language News Corpus", author = "Hauser, Renate and Vamvas, Jannis and Ebling, Sarah and Volk, Martin", booktitle = "Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.readi-1.4", pages = "25--30", abstract = "Simplified language news articles are being offered by specialized web portals in several countries. The thousands of articles that have been published over the years are a valuable resource for natural language processing, especially for efforts towards automatic text simplification. In this paper, we present SNIML, a large multilingual corpus of news in simplified language. The corpus contains 13k simplified news articles written in one of six languages: Finnish, French, Italian, Swedish, English, and German. All articles are shared under open licenses that permit academic use. The level of text simplification varies depending on the news portal. We believe that even though SNIML is not a parallel corpus, it can be useful as a complement to the more homogeneous but often smaller corpora of news in the simplified variety of one language that are currently in use.", }