This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:pacoco:start [2019-07-17 21:44] – tkew | public:pacoco:start [2023-09-15 20:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== The Zurich Parallel Corpus Collection ====== | ====== The Zurich Parallel Corpus Collection ====== | ||
- | <div center round tip 60%> | + | <div center round info 50%> |
- | Data will be available | + | The corpus files are available |
</ | </ | ||
The Zurich Parallel Corpus Collection currently consists of seven publicly available text corpora. These corpora are largely parallel or multi-parallel and cover a diverse range of domains, from mountaineering reports to articles on international business and finance. | The Zurich Parallel Corpus Collection currently consists of seven publicly available text corpora. These corpora are largely parallel or multi-parallel and cover a diverse range of domains, from mountaineering reports to articles on international business and finance. | ||
+ | |||
+ | Each corpus is available at the following links: | ||
+ | * [[Text+Berg|Text+Berg]] | ||
+ | * [[Credit Suisse|Credit Suisse]] | ||
+ | * [[Medi-Notice|Medi-Notice]] | ||
+ | * [[Horizonte|Horizonte]] | ||
+ | * [[Sparcling|Sparcling]] | ||
+ | * [[Rumantsch Grischun|Rumantsch-Grischun]] | ||
+ | * [[Swiss Legislation Corpus|Swiss Legislation Corpus]] | ||
+ | * [[Swatchgroup|Swatchgroup «Geschäftsbricht»]] | ||
+ | |||
In order to make these corpora publicly available, we have extended the popular [[https:// | In order to make these corpora publicly available, we have extended the popular [[https:// | ||
+ | |||
+ | |||
+ | ===== How to cite ===== | ||
+ | |||
+ | We presented the format at the [[http:// | ||
+ | |||
+ | <file biblatex GraenKewShaitarovaVolk2019.bib> | ||
+ | @inproceedings{GraenKewShaitarovaVolk2019, | ||
+ | month = {July}, | ||
+ | author = {Gra\" | ||
+ | | ||
+ | editor = {Bański, Piotr and Barbaresi, Adrien and Biber, Hanno and Breiteneder, | ||
+ | title = {Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection}, | ||
+ | | ||
+ | pages = {1--8}, | ||
+ | year = {2019}, | ||
+ | url = {https:// | ||
+ | doi = {10.14618/ | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | |||
===== The CoNLL-UPPa Format ===== | ===== The CoNLL-UPPa Format ===== | ||
Line 69: | Line 102: | ||
- | At the sentence and text level, metadata can vary dramatically. As such it is not possible to account for all potential types of metadata. | + | At the sentence and text level, metadata can vary dramatically. As such it is not possible to account for all potential types of metadata. |
Some examples of metadata currently stored in sentence files are: | Some examples of metadata currently stored in sentence files are: |