This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
public:cutter:start [2018-11-18 02:03] – Johannes Graën | public:cutter:start [2018-11-18 17:37] – [Web service] Johannes Graën | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Cutter – a Universal Multilingual Tokenizer ====== | ====== Cutter – a Universal Multilingual Tokenizer ====== | ||
Cutter is a rule-based tokenizer that is easily adaptable to other languages and text types. | Cutter is a rule-based tokenizer that is easily adaptable to other languages and text types. | ||
+ | |||
===== History ===== | ===== History ===== | ||
Line 9: | Line 10: | ||
* June 2018 -- released as Python module | * June 2018 -- released as Python module | ||
* November 2018 -- released as [[https:// | * November 2018 -- released as [[https:// | ||
+ | |||
===== Demos ===== | ===== Demos ===== | ||
Line 21: | Line 23: | ||
* [[https:// | * [[https:// | ||
* [[https:// | * [[https:// | ||
+ | |||
===== Source ===== | ===== Source ===== | ||
* [[gitlab> | * [[gitlab> | ||
* [[gitlab> | * [[gitlab> | ||
+ | |||
===== PyPI package ===== | ===== PyPI package ===== | ||
Line 61: | Line 65: | ||
By means of the third column, the tokenization tree can be reconstructed: | By means of the third column, the tokenization tree can be reconstructed: | ||
{{: | {{: | ||
+ | |||
+ | |||
+ | ===== Web service ===== | ||
+ | We also provide a web service for tokenization using one of the pre-defined profiles: | ||
+ | <code bash> | ||
+ | echo " | ||
+ | | curl --data @- https:// | ||
+ | | jq | ||
+ | </ | ||
+ | |||
+ | This call returns a JSON object comprising a list of tokens and their respective tags: | ||
+ | <file json> | ||
+ | [ | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | ] | ||
+ | </ | ||
Line 74: | Line 161: | ||
* Daniel Wüest | * Daniel Wüest | ||
* Alex Flückiger | * Alex Flückiger | ||
+ | |||
===== Citation ===== | ===== Citation ===== |