The Horizonte corpus is built upon the magazine of the same name, published by the Swiss National Science Foundation (SNSF).
This corpus consists of magazine articles in German, French and English related to popular science and research projects in and around Switzerland.
The Horizonte Online corpus consists of articles available on the Horizons magazine website, collected in 2018. These articles span 4 years, from 2014 until 2018.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 114084 | 19318 | 10609 | 8584 | 158 |
en | 131146 | 13324 | 8404 | 8035 | 157 |
fr | 126333 | 15010 | 7315 | 7583 | 158 |
Total | 371563 | 47652 | 26328 | 24202 | 473 |
The corpus has been aligned on the document level.
The Horizonte PDF corpus consists of articles taken from electronic PDFs of the Horizonte magazine from their online archive. The articles span 12 years, from 2005 until 2017.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 1025245 | 85221 | 35577 | 75014 | 1237 |
en | 392975 | 24793 | 14209 | 23865 | 395 |
fr | 1193874 | 51562 | 17557 | 71995 | 1237 |
Total | 2612094 | 161576 | 67343 | 170874 | 2869 |
The corpus has been aligned on the document level.