This is an old revision of the document!


Credit Suisse

Credit Suisse Bulletin Magazine The Credit Suisse corpus is built on the world's oldest banking magazine, the Credit Suisse Bulletin, which has been in print since 1895.

The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus.

Credit Suisse News corpus

The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017. The articles were collected by students in the course “Introduction to Multilingual Text Analysis” (fall semester 2014, 2015, 2016, 2017).

lang tokens types lemmas sents texts
de 1908735 105560 58166 115196 1797
en 2078198 53534 26839 110483 1821
fr 2027287 56333 20009 99444 1596
it 1869238 64029 21447 94439 1542
Total 7883458 279456 126461 419562 6756

Credit Suisse PDF Bulletin corpus

The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017.

lang tokens types lemmas sents texts
de 3610493 204526 111217 269416 2713
en 2225123 78753 35245 137688 1405
fr 4012143 114011 30538 255677 2613
it 3393228 117638 32723 215317 2319
Total 13240987 514928 209723 878098 9050

Credit Suisse Bulletin In Print corpus

The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997.

lang tokens types lemmas sents texts
de 14239553 467354 172506 1204989 787
en 4632880 112646 35349 423903 101
es 909556 46421 11662 87721 19
fr 16106141 262097 35479 1179453 627
it 4644146 129632 27993 389859 99
Total 40532276 1018150 282989 3285925 1633

Publications


CL Wiki

Institute of Computational Linguistics – University of Zurich