The Medi-Notice contain parallel texts from information leaflets for medications and pharmaceutical products made public by the Swiss Agency for Therapeutic Products (Swissmedic).
The corpus is divided into two main subcorpora: Specialist Information (fi) and Patient Information (pi).
According to Swiss law, patient leaflets must be written in German, French and Italian, whereas the information for healthcare professionals is required only in German and French. Thus, the Medi-Notice corpus contains German and French parallel texts in the professional subsection, while the patient subsection is trilingual.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 15768958 | 197395 | 85922 | 1054570 | 4297 |
fr | 19906745 | 105851 | 21746 | 1059147 | 4297 |
Total | 35675703 | 303246 | 107668 | 2113717 | 8594 |
The corpus has been aligned on the document and sentence level.
lang | tokens | types | lemmas | sents | texts |
---|---|---|---|---|---|
de | 7070918 | 68516 | 38460 | 449149 | 4543 |
fr | 8366384 | 47480 | 16702 | 450625 | 4543 |
it | 7794849 | 51949 | 39409 | 444544 | 4539 |
Total | 23232151 | 167945 | 94571 | 1344318 | 13625 |
The corpus has been aligned on the document and sentence level.