This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:cutter:start [2018-11-18 02:03] – Johannes Graën | public:cutter:start [2023-09-15 20:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Cutter – a Universal Multilingual Tokenizer ====== | ====== Cutter – a Universal Multilingual Tokenizer ====== | ||
Cutter is a rule-based tokenizer that is easily adaptable to other languages and text types. | Cutter is a rule-based tokenizer that is easily adaptable to other languages and text types. | ||
+ | |||
===== History ===== | ===== History ===== | ||
Line 9: | Line 10: | ||
* June 2018 -- released as Python module | * June 2018 -- released as Python module | ||
* November 2018 -- released as [[https:// | * November 2018 -- released as [[https:// | ||
+ | |||
===== Demos ===== | ===== Demos ===== | ||
The current version is always available at [[https:// | The current version is always available at [[https:// | ||
- | * [[https:// | + | * version 1.0 (Apr. 2016) |
- | * [[https:// | + | * version 1.2 (Jul. 2016) |
- | * [[https:// | + | * version 1.4 (Feb. 2017) |
- | * [[https:// | + | * version 1.6 (May 2017) |
- | * [[https:// | + | * version 2.0 (June 2018) |
- | * [[https:// | + | * version 2.1 (August 2018) |
- | * [[https:// | + | * version 2.2 (September 2018) |
- | * [[https:// | + | * version 2.3 (November 2018) |
+ | * version 2.4 (January 2019) | ||
+ | * version 2.5 (June 2019) | ||
===== Source ===== | ===== Source ===== | ||
* [[gitlab> | * [[gitlab> | ||
* [[gitlab> | * [[gitlab> | ||
+ | |||
===== PyPI package ===== | ===== PyPI package ===== | ||
Line 61: | Line 67: | ||
By means of the third column, the tokenization tree can be reconstructed: | By means of the third column, the tokenization tree can be reconstructed: | ||
{{: | {{: | ||
+ | |||
+ | |||
+ | ===== Web service ===== | ||
+ | We also provide a web service for tokenization using one of the pre-defined profiles: | ||
+ | <code bash> | ||
+ | echo " | ||
+ | | curl --data @- https:// | ||
+ | | jq | ||
+ | </ | ||
+ | |||
+ | This call returns a JSON object comprising a list of tokens and their respective tags: | ||
+ | <file json> | ||
+ | [ | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | ] | ||
+ | </ | ||
Line 74: | Line 163: | ||
* Daniel Wüest | * Daniel Wüest | ||
* Alex Flückiger | * Alex Flückiger | ||
+ | |||
===== Citation ===== | ===== Citation ===== |