This is an old revision of the document!


Cutter - a Universal Multilingual Tokenizer

History

  • end of 2015 – concept and first development version (PHP)
  • April 2016 – first release
  • until May 2017 – continued development for up to 17 languages
  • from January 2018 – reimplementation in Python
  • June 2018 – release as Python module :TODO:

Demos

Source

  • graen/cutter (version 1.x – PHP)
  • :TODO: (version 2.x – Python)

Contributors

  • Johannes Graën
  • Martin Volk
  • Mara Bertamini
  • Chantal Amrhein
  • Phillip Ströbel
  • Anne Göhring
  • Natalia Korchagina
  • Simon Clematide
  • Daniel Wüest

CL Wiki

Institute of Computational Linguistics – University of Zurich