Zmorge - The Zurich Morphological Analyzer for German

Description

Zmorge is a morphology tool that combines a lexicon that is automatically extracted from Wiktionary, and a modified version of the finite-state morphological grammar SMOR. The extraction script is open source, so that new versions of the lexicon can be extracted from future, expanded versions of Wiktionary.

Modifications to SMOR grammar

  1. the lexicon, grammar and transducer all use UTF-8 encoding.
  2. the output is no longer a derivational analysis, but defines the following as the base form:
    • nouns: Nom. Sg. (or Nom. Pl. for plural-only nouns)
    • verbs: infinitive
    • adjectives: Pos. Adv./Pred.
  3. morpheme boundaries are still explicity marked, but using different labels:
    • <TRUNC>: marks hyphenation (same as original SMOR)
    • <#>: marks compound boundary
    • <->: marks joining element (Fugenelement) in compounds
    • <~>: marks other morpheme boundary

Usage instructions

Compiled finite-state transducers can be used with the SFST tool as follows:
echo "Vermittlungsgespräche" | fst-infl2 zmorge-20140521-smor_newlemma.ca

Download

date lexicon transducers clevertagger model coverage of TüBa-D/Z
(see Sennrich and Kunz (2014))
original SMOR grammar modified SMOR grammar (SMORLemma)
standard compact standard compact (recommended)
15.03.2015 zmorge-20150315.xml zmorge-20150315-smor_orig.a zmorge-20150315-smor_orig.ca zmorge-20150315-smor_newlemma.a zmorge-20150315-smor_newlemma.ca 80.4%
24.12.2014 zmorge-20141224.xml zmorge-20141224-smor_orig.a zmorge-20141224-smor_orig.ca zmorge-20141224-smor_newlemma.a zmorge-20141224-smor_newlemma.ca 79.9%
21.05.2014 zmorge-20140521.xml zmorge-20140521-smor_orig.a zmorge-20140521-smor_orig.ca zmorge-20140521-smor_newlemma.a zmorge-20140521-smor_newlemma.ca hdt_ab.zmorge-20140521-smor_newlemma.model 79.4%
24.02.2014 zmorge-20140224.xml zmorge-20140224-smor_orig.ca zmorge-20140224-smor_newlemma.ca 79.3%

Source code

The program to extract a lexicon from Wiktionary is available at https://github.com/rsennrich/zmorge.
The modified version of the SMOR grammar is available at https://github.com/rsennrich/SMORLemma.

License

The lexicon is licensed under the Creative Commons BY-SA 3.0 license.
The extraction scripts and the SMOR grammar are licensed under the GPL v2.

References

Rico Sennrich and Beat Kunz. 2014: Zmorge: A German Morphological Lexicon Extracted from Wiktionary. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014). PDF