The Credit Suisse corpus is built on the world's oldest banking magazine, the Credit Suisse Bulletin, which has been in print since 1895.
The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus.
The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 1908735 | 105560 | 58166 | 115196 | 1797 |
en | 2078198 | 53534 | 26839 | 110483 | 1821 |
fr | 2027287 | 56333 | 20009 | 99444 | 1596 |
it | 1869238 | 64029 | 21447 | 94439 | 1542 |
Total | 7883458 | 279456 | 126461 | 419562 | 6756 |
The corpus has been aligned on the document and sentence level.
The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 3610493 | 204526 | 111217 | 269416 | 2713 |
en | 2225123 | 78753 | 35245 | 137688 | 1405 |
fr | 4012143 | 114011 | 30538 | 255677 | 2613 |
it | 3393228 | 117638 | 32723 | 215317 | 2319 |
Total | 13240987 | 514928 | 209723 | 878098 | 9050 |
The corpus has been aligned on the document and sentence level.
The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 14239553 | 467354 | 172506 | 1204989 | 787 |
en | 4632880 | 112646 | 35349 | 423903 | 101 |
es | 909556 | 46421 | 11662 | 87721 | 19 |
fr | 16106141 | 262097 | 35479 | 1179453 | 627 |
it | 4644146 | 129632 | 27993 | 389859 | 99 |
Total | 40532276 | 1018150 | 282989 | 3285925 | 1633 |
The corpus has not been aligned yet.