====== Sparcling ====== The Sparcling corpus is built on top of a [[..:CoStEP:|cleaned version of the Europarl corpus]]. It provided a basis for alignment experiments and features multilingual alignment on the sentence and text level. The corpus served as a reference for the development of the [[mlw>/corpus=europarlf9|Multilingwis search engine]] for exploration of multilingual word-aligned corpora. Many other applications (see [[http://pub.cl.uzh.ch/purl/graen]]) turn to account its combination of language-dependend annotation and interlingual alignment. ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | bg | 7509902 | 100378 | n/a | 302703 | 33187 | | de | 41107021 | 368437 | 97327 | 1754173 | 146544 | | el | 32263532 | 244448 | n/a | 1245560 | 114950 | | en | 43151584 | 129616 | 43981 | 1675807 | 146544 | | es | 45232847 | 177200 | 93572 | 1667918 | 146544 | | et | 8136702 | 251360 | 40374 | 447006 | 45126 | | fi | 28363987 | 669416 | 134694 | 1587455 | 136299 | | fr | 47270588 | 143977 | 82548 | 1692398 | 146544 | | it | 42648100 | 181510 | 98312 | 1646744 | 146544 | | nl | 42954617 | 263736 | 33630 | 1800838 | 145478 | | pl | 9334433 | 162314 | 19495 | 455103 | 44371 | | pt | 44029641 | 182020 | 26645 | 1642878 | 144408 | | ro | 7963967 | 83339 | 19369 | 308289 | 33725 | | sk | 9406142 | 161572 | 28317 | 435121 | 44613 | | sl | 9208808 | 134342 | 16210 | 420850 | 43810 | | sv | 36135818 | 337746 | 253731 | 1655134 | 137540 | ^ Total ^ 454717689 ^ 3591411 ^ 988206 ^ 18737977 ^ 1656227 ^ ==== Alignment ==== The corpus has been aligned on the document, sentence and word level. ===== Publications ===== * Exploiting alignment in multiparallel corpora for applications in linguistics and language learning [[http://www.zora.uzh.ch/153213|Graën 2018]] ===== Relevant links ===== * Multilingwis example ‹Art und Weise›: [[mlw>[Art und Weise] /corpus=europarlf9]] * Alignment overlap example ‹annoy vs. bother vs. disturb›: [[https://pub.cl.uzh.ch/projects/sparcling/alignment_overlap/#lemmas=en:disturb,en:annoy,en:bother&filter=sk,pl,ro,sl|en:disturb,en:annoy,en:bother]] * Constellations (English/Swedish): [[https://pub.cl.uzh.ch/projects/sparcling/constellations/prep.php?dep_measure=t-score&al_measure=t-score&norm=tanhavg&score=as12*as34*as13/as24*log(c)|Verb with preposition]]