This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
public:pacoco:credit_suisse [2019-07-17 22:39] – [Credit Suisse News corpus] tkew | public:pacoco:credit_suisse [2019-10-16 10:04] – Johannes Graën | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ~~NOTOC~~ | ||
====== Credit Suisse ====== | ====== Credit Suisse ====== | ||
Line 6: | Line 7: | ||
The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus. | The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus. | ||
- | ==== Credit Suisse News corpus ==== | + | |
+ | ===== Credit Suisse News corpus | ||
The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017. | The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017. | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | ^ lang |
- | |de | 1908735 | 105560 | 58166 | 115196 | 1797 | | + | ^ de |
- | |en | 2078198 | 53534 | 26839 | 110483 | 1821 | | + | ^ en |
- | |fr | 2027287 | 56333 | 20009 | 99444 | 1596 | | + | ^ fr |
- | |it | 1869238 | 64029 | 21447 | 94439 | 1542 | | + | ^ it |
- | ^Total ^ 7883458 ^ 279456 ^ 126461 ^ 419562 ^ 6756 ^ | + | ^ Total ^ 7883458 ^ 279456 ^ 126461 ^ 419562 ^ |
+ | ==== Alignment ==== | ||
+ | The corpus has been aligned on the document and sentence level. | ||
- | ==== Credit Suisse PDF Bulletin corpus ==== | + | |
+ | ===== Credit Suisse PDF Bulletin corpus | ||
The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017. | The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017. | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | ^ lang |
- | |de | 3610493 | 204526 | 111217 | 269416 | 2713 | | + | ^ de |
- | |en | 2225123 | 78753 | 35245 | 137688 | 1405 | | + | ^ en |
- | |fr | 4012143 | 114011 | 30538 | 255677 | 2613 | | + | ^ fr |
- | |it | 3393228 | 117638 | 32723 | 215317 | 2319 | | + | ^ it |
- | ^ Total ^ 13240987 ^ 514928 ^ 209723 ^ 878098 ^ 9050 ^ | + | ^ Total ^ 13240987 ^ 514928 ^ 209723 ^ 878098 ^ |
- | ==== Credit Suisse Bulletin In Print corpus ==== | + | ==== Alignment ==== |
+ | The corpus has been aligned on the document and sentence level. | ||
+ | |||
+ | |||
+ | ===== Credit Suisse Bulletin In Print corpus | ||
The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997. | The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997. | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | ^ lang |
- | |de | 14239553 | 467354 | 172506 | 1204989 | 787 | | + | ^ de |
- | |en | 4632880 | 112646 | 35349 | 423903 | 101 | | + | ^ en |
- | |es | 909556 | 46421 | 11662 | 87721 | 19 | | + | ^ es |
- | |fr | 16106141 | 262097 | 35479 | 1179453 | 627 | | + | ^ fr |
- | |it | 4644146 | 129632 | 27993 | 389859 | 99 | | + | ^ it |
- | ^Total ^ 40532276 ^ 1018150 ^ 282989 ^ 3285925 ^ 1633 ^ | + | ^ Total ^ 40532276 ^ 1018150 ^ 282989 ^ 3285925 ^ |
- | --------- | + | ==== Alignment ==== |
+ | The corpus has not been aligned yet. | ||
- | === Relevant links === | ||
+ | ===== Publications ===== | ||
+ | |||
+ | * Building a Parallel Corpus on the World' | ||
+ | |||
+ | |||
+ | ===== Relevant links ===== | ||
+ | * Multilingwis example ‹rentrer chez soi›: [[mlw> | ||
*[[https:// | *[[https:// | ||
*[[https:// | *[[https:// | ||
+ | *[[https:// | ||
+ | |||
- | === Publications === | ||
- | *[[https:// |