Rumantsch Grischun

The Rumantsch Grischun corpus contains legal and press texts from the State Chancellory of the Swiss Canton of Graubünden. The corpus is entirely parallel, containing more than 5000 texts in both Romansh (Rumantsch) and German.

In the currently available version, only the legal texts are available.

This corpus proves to be a valuable resource for the low-resource language Romansh.

lang tokens types lemmas sents texts
de 432862 23813 15003 28783 5641
rm 543173 13868 7973 28811 5570
Total 976035 37681 22976 57594 11211


The corpus has been aligned on the document, sentence and word level.

