Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
public:pacoco:start [2019-07-18 06:40] – [Text] tkewpublic:pacoco:start [2023-09-15 20:33] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== The Zurich Parallel Corpus Collection ====== ====== The Zurich Parallel Corpus Collection ======
  
-<div center round tip 60%> +<div center round info 50%> 
-Data will be available on July 22nd.+The corpus files are available for download at [[https://pub.cl.uzh.ch/corpora/PaCoCo/]]
 </div> </div>
  
Line 9: Line 9:
  
 Each corpus is available at the following links: Each corpus is available at the following links:
-  * [[public:pacoco:text_berg|Text+Berg Corpus]] +  * [[Text+Berg|Text+Berg]] 
-  * [[public:pacoco:credit_suisse|Credit Suisse Corpus]] +  * [[Credit Suisse|Credit Suisse]] 
-  * [[public:pacoco:medi-notice|Medi-Notice]] +  * [[Medi-Notice|Medi-Notice]] 
-  * [[public:pacoco:horizonte|Horizonte]] +  * [[Horizonte|Horizonte]] 
-  * [[public:pacoco:sparcling|Sparcling]] +  * [[Sparcling|Sparcling]] 
-  * [[public:pacoco:rumantsch-grischun|Rumantsch-Grischun]] +  * [[Rumantsch Grischun|Rumantsch-Grischun]] 
-  * [[public:pacoco:swiss_legal_corpus|Swiss Legislation Corpus]] +  * [[Swiss Legislation Corpus|Swiss Legislation Corpus]] 
-  * [[[[public:pacoco:swatchgroup|Swatchgroup Geschäftsbericht]]+  * [[Swatchgroup|Swatchgroup «Geschäftsbricht»]]
  
  
 In order to make these corpora publicly available, we have extended the popular [[https://universaldependencies.org/format.html|CoNLL-U]] format to efficiently accommodate our parallel texts. In order to make these corpora publicly available, we have extended the popular [[https://universaldependencies.org/format.html|CoNLL-U]] format to efficiently accommodate our parallel texts.
 +
 +
 +===== How to cite =====
 +
 +We presented the format at the [[http://corpora.ids-mannheim.de/cmlc-2019.html|7th Workshop on the Challenges in the Management of Large Corpora]] at [[http://www.cl2019.org/|CL 2019]]. The publication is available via the [[https://ids-pub.bsz-bw.de/frontdoor/index/index/docId/8998|IDS publication server]] or [[http://www.zora.uzh.ch/175081|ZORA]].
 +
 +<file biblatex GraenKewShaitarovaVolk2019.bib>
 +@inproceedings{GraenKewShaitarovaVolk2019,
 +           month = {July},
 +          author = {Gra\"{e}n, Johannes and Kew, Tannon and Shaitarova, Anastassia and Volk, Martin},
 +       booktitle = {Proceedings of the 7th Workshop on Challenges in the Management of Large Corpora (CMLC)},
 +          editor = {Bański, Piotr and Barbaresi, Adrien and Biber, Hanno and Breiteneder, Evelyn and Clematide, Simon and Kupietz, Marc and Lüngen, Harald and Iliadi, Caroline},
 +           title = {Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection},
 +       publisher = {Leibniz-Institut f\"{u}r Deutsche Sprache},
 +           pages = {1--8},
 +            year = {2019},
 +             url = {https://doi.org/10.5167/uzh-175081},
 +             doi = {10.14618/ids-pub-9020}
 +}
 +</file>
 +
 +
  
 ===== The CoNLL-UPPa Format ===== ===== The CoNLL-UPPa Format =====

CL Wiki

Institute of Computational Linguistics – University of Zurich