This is an old revision of the document!
The Credit Suisse corpus is built on the world's oldest banking magazine, the Credit Suisse Bulletin, which has been in print since 1895.
The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus.
The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 1908735 | 105560 | 58166 | 115196 | 1797 |
en | 2078198 | 53534 | 26839 | 110483 | 1821 |
fr | 2027287 | 56333 | 20009 | 99444 | 1596 |
it | 1869238 | 64029 | 21447 | 94439 | 1542 |
Total | 7883458 | 279456 | 126461 | 419562 | 6756 |
The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 3610493 | 204526 | 111217 | 269416 | 2713 |
en | 2225123 | 78753 | 35245 | 137688 | 1405 |
fr | 4012143 | 114011 | 30538 | 255677 | 2613 |
it | 3393228 | 117638 | 32723 | 215317 | 2319 |
Total | 13240987 | 514928 | 209723 | 878098 | 9050 |
The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 14239553 | 467354 | 172506 | 1204989 | 787 |
en | 4632880 | 112646 | 35349 | 423903 | 101 |
es | 909556 | 46421 | 11662 | 87721 | 19 |
fr | 16106141 | 262097 | 35479 | 1179453 | 627 |
it | 4644146 | 129632 | 27993 | 389859 | 99 |
Total | 40532276 | 1018150 | 282989 | 3285925 | 1633 |