What can I do on this website?

Reading simple news articles

On this website you can read news articles in simplified language. The articles have originally been written in different languages. We have collected the articles and translated them to German and English using automatic machine translation.

The articles were originally published by other news sites. The original source of an article is stated below the title. The newest articles are always on top of the site below the title "This week".

We collect the articles from the following websites:

Downloading archives of simplified news articles

This application automatically collect news articles that are published by six simplified news portals. We prepare the collected articles as an XML corpus. The Simplified News in Many Languages (SNIML) corpus can be downloaded from this website. It contains over 13,400 simplified news articles in Finnish, French, Italian, Swedish, English and German. All articles in the corpus are available under an open license that permits academic research use.

In addition, sub-corpora containing only the articles published in one language can be downloaded. Furthermore, a sub-corpus for each month is provided. We plan to publish a new version of the corpus containing the latest articles on a monthly basis.

This corpus was created as part of a project of the Department of Computational Linguistics at the University of Zurich.

Sources

News Portal URL Language License Number of Articles
The Times in Plain English https://www.thetimesinplainenglish.com/ en-US "may be distributed and reproduced by all" 1,897
Informazione Facile https://informazionefacile.it/ it-IT CC BY-SA 4.0 2,686
Journal Essentiel https://journalessentiel.be/ fr-BE CC BY-SA 4.0 2,723
Infoeasy https://infoeasy-news.ch/ fr-BE CC BY-NC-ND 4.0 147
Selkosanomat https://selkosanomat.fi/ fi CC BY-NC-ND 4.0 3,379
L├Ątta Bladet https://lattabladet.fi/ sv-SE CC BY-NC-ND 4.0 2,559

We would like to thank the editors of the news portals used for making their articles available.

Citation

@inproceedings{hauser-etal-2022-multilingual,
    title = "A Multilingual Simplified Language News Corpus",
    author = "Hauser, Renate  and
      Vamvas, Jannis  and
      Ebling, Sarah  and
      Volk, Martin",
    booktitle = "Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.readi-1.4",
    pages = "25--30",
    abstract = "Simplified language news articles are being offered by specialized web portals in several countries. The thousands of articles that have been published over the years are a valuable resource for natural language processing, especially for efforts towards automatic text simplification. In this paper, we present SNIML, a large multilingual corpus of news in simplified language. The corpus contains 13k simplified news articles written in one of six languages: Finnish, French, Italian, Swedish, English, and German. All articles are shared under open licenses that permit academic use. The level of text simplification varies depending on the news portal. We believe that even though SNIML is not a parallel corpus, it can be useful as a complement to the more homogeneous but often smaller corpora of news in the simplified variety of one language that are currently in use.",
}

Imprint

University of Zurich
Department of Computational Linguistics
Andreasstrasse 15
8050 Zurich
Switzerland
webmaster@cl.uzh.ch