This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:pacoco:credit_suisse [2019-07-18 22:22] – [Table] Johannes Graën | public:pacoco:credit_suisse [2023-09-15 20:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ~~NOTOC~~ | ||
====== Credit Suisse ====== | ====== Credit Suisse ====== | ||
Line 6: | Line 7: | ||
The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus. | The corpus consists of three main subcorpora: Credit Suisse News corpus, Credit Suisse PDF Bulletin corpus and Credit Suisse Bulletin In Print corpus. | ||
- | ==== Credit Suisse News corpus ==== | + | |
+ | ===== Credit Suisse News corpus | ||
The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017. | The Credit Suisse News Corpus is a collection of news articles from the Credit Suisse web page in four languages (English, French, German, Italian). They range from 2001 to 2017. | ||
^ lang ^ tokens | ^ lang ^ tokens | ||
- | | de | + | ^ de |
- | | en | + | ^ en |
- | | fr | + | ^ fr |
- | | it | + | ^ it |
^ Total ^ 7883458 ^ 279456 ^ 126461 ^ 419562 ^ 6756 ^ | ^ Total ^ 7883458 ^ 279456 ^ 126461 ^ 419562 ^ 6756 ^ | ||
+ | ==== Alignment ==== | ||
+ | The corpus has been aligned on the document and sentence level. | ||
- | ==== Credit Suisse PDF Bulletin corpus ==== | + | |
+ | ===== Credit Suisse PDF Bulletin corpus | ||
The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017. | The Credit Suisse PDF Bulletin Corpus is a collection of magazine articles from the Credit Suisse Bulletin in four languages (English, French, German, Italian). They range from 1998 to 2017. | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | ^ lang |
- | |de | 3610493 | 204526 | 111217 | 269416 | 2713 | | + | ^ de |
- | |en | 2225123 | 78753 | 35245 | 137688 | 1405 | | + | ^ en |
- | |fr | 4012143 | 114011 | 30538 | 255677 | 2613 | | + | ^ fr |
- | |it | 3393228 | 117638 | 32723 | 215317 | 2319 | | + | ^ it |
- | ^ Total ^ 13240987 ^ 514928 ^ 209723 ^ 878098 ^ 9050 ^ | + | ^ Total ^ 13240987 ^ 514928 ^ 209723 ^ 878098 ^ |
- | ==== Credit Suisse Bulletin In Print corpus ==== | + | ==== Alignment ==== |
+ | The corpus has been aligned on the document and sentence level. | ||
+ | |||
+ | |||
+ | ===== Credit Suisse Bulletin In Print corpus | ||
The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997. | The Credit Suisse Bulletin In Print Corpus is a collection of magazine articles from the Credit Suisse Bulletin in five languages (English, French, German, Italian, Spanish). They range from 1895 to 1997. | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | ^ lang |
- | |de | 14239553 | 467354 | 172506 | 1204989 | 787 | | + | ^ de |
- | |en | 4632880 | 112646 | 35349 | 423903 | 101 | | + | ^ en |
- | |es | 909556 | 46421 | 11662 | 87721 | 19 | | + | ^ es |
- | |fr | 16106141 | 262097 | 35479 | 1179453 | 627 | | + | ^ fr |
- | |it | 4644146 | 129632 | 27993 | 389859 | 99 | | + | ^ it |
- | ^Total ^ 40532276 ^ 1018150 ^ 282989 ^ 3285925 ^ 1633 ^ | + | ^ Total ^ 40532276 ^ 1018150 ^ 282989 ^ 3285925 ^ |
- | --------- | + | ==== Alignment ==== |
+ | The corpus has not been aligned yet. | ||
- | === Relevant links === | ||
+ | ===== Publications ===== | ||
+ | |||
+ | * Building a Parallel Corpus on the World' | ||
+ | |||
+ | |||
+ | ===== Relevant links ===== | ||
+ | * Multilingwis example ‹rentrer chez soi›: [[mlw> | ||
*[[https:// | *[[https:// | ||
*[[https:// | *[[https:// | ||
+ | *[[https:// | ||
+ | |||
- | === Publications === | ||
- | * Building a Parallel Corpus on the World' |