The Rumantsch Grischun corpus contains legal and press texts from the State Chancellory of the Swiss Canton of Graubünden. The corpus is entirely parallel, containing more than 5000 texts in both Romansh (Rumantsch) and German.
In the currently available version, only the legal texts are available.
This corpus proves to be a valuable resource for the low-resource language Romansh.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 432862 | 23813 | 15003 | 28783 | 5641 |
rm | 543173 | 13868 | 7973 | 28811 | 5570 |
Total | 976035 | 37681 | 22976 | 57594 | 11211 |
The corpus has been aligned on the document, sentence and word level.