This is an old revision of the document!


Medi-Notice

The Medi-Notice contain parallel texts from information leaflets for medications and pharmaceutical products made public by the Swiss Agency for Therapeutic Products (Swissmedic).

The corpus is divided into two main subcorpora: Specialist Information (fi) and Patient Information (pi).

According to Swiss law, patient leaflets must be written in German, French and Italian, whereas the information for healthcare professionals is required only in German and French. Thus, the Medi-Notice corpus contains German and French parallel texts in the professional subsection, while the patient subsection is trilingual.

Medi-Notice Specialist Information (fi)

lang tokens types lemmas sents texts
de 15768958 197395 85922 1054570 4297
fr 19906745 105851 21746 1059147 4297
Total 35675703 303246 107668 2113717 8594

Alignment

The corpus has been aligned on the document and sentence level.

Medi-Notice Patient Information (pi)

lang tokens types lemmas sents texts
de 7070918 68516 38460 449149 4543
fr 8366384 47480 16702 450625 4543
it 7794849 51949 39409 444544 4539
Total 23232151 167945 94571 1344318 13625

Alignment

The corpus has been aligned on the document and sentence level.


CL Wiki

Institute of Computational Linguistics – University of Zurich