(leer en español)
The main problem with spell checking Quechua words lies in the circumstance that there is no real ’beyond
all doubt’ standard orthography for Quechua. Nevertheless, a spell checker relies
on some sort of ’gold standard’ to check given word forms.
This spell checker offers spell checking in the Unified Southern Quechua as described by Cerrón-Palomino (1994):
Cerrón-Palomino, R. (1994). Quechua sureño, diccionario uniﬁcado quechua-castellano, castellano-quechua. Lima: Biblioteca Nacional del Perú.
→ you can download an electronic version of this dictionary from the website of the Instituto de Lenguas y Literaturas Andinas-Amazonicas.
This spell checker relies on Levenshtein distance as error metric, see
edit distance on Wikipedia. This tool is written in foma, an open source framework for building finite state transducers.
For suggestions or corrections, please write to ariosATifi.uzh.ch.
Some of the characteristics of this orthography are:
You can download the sources for the spell checker (no binaries) from the Squoia GitHub Repository. Plugins for LibreOffice/OpenOffice are available here.
NEW: We implemented a system that converts texts written in other orthographies automatically to the Unified Standard orthography, you can test it here (this is not a spell checker though, it will just normalize the orthography).
- it's a 'trivocalic' orthography, only a, i y u are allowed in quechua morphemes:
- qollqe will be corrected to qullqi
- teqse will be corrected to tiqsi
- for Spanish roots, we use the official Spanish spelling:
- nasyun will be corrected to nación
- dirichu will be corrected to derecho
- the semi-vowel w is always written as w, never as u:
- mauk'a will be corrected to mawk'a
- the sequence l(l)q is always written as llq:
- qulqi will be corrected to qullqi
- the sequence n/mp is always written as mp:
- panpa will be corrected to pampa
- ph is always written as ph, even in the syllable coda:
- rafra will be corrected to raphra
- lliflli will be corrected to lliphlli
- q is always written as q, even in the syllable coda:
- hoj will be corrected to huq
- wasitaj will be corrected to wasitaq
- k is always written as k, even in the syllable coda:
- ajllay will be corrected to akllay
- pijchu will be corrected to pikchu
- all 1st person plural inclusive and 2nd person plural markers end with k:
- pukllanchis or pukllanchiq will be corrected to pukllanchik
- wasiykichis or wasiykichiq will be corrected to wasiykichik
- the form of the progressive suffix is -chka:
- purishanku or purisyanku will be corrected to purichkanku
- the form of the genitive suffix after a vowel is -p:
- wasiq punkun will be corrected to wasip punkun
- the 'shortened' forms -yu/-ya/-y of the suffix -yku/-yka are written in their full form:
- puriyamun will be corrected to puriykamun
- rikhuywanchik will be corrected to rikhuykuwanchik
- waqayuspa will be corrected to waqaykuspa
- the 'shortened' forms -ru/-ra of the suffix -rqu/-rqa are written in their full form:
- puriramun will be corrected to purirqamun
- rupharunman will be corrected to rupharqunman
A more detailed description (of an older version though) can be found in:
Rios, A. (2011). Spell checking an agglutinative language: Quechua. In: 5th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, 25 November 2011 - 27 November 2011, 51-55. PDF
Thanks to Richard Castro Mamani from the Universidad Nacional de San Antonio Abad del Cusco, there is now a user friendly text editor with Quechua spell checker available (older version!).
Changes since the last version:
You can test the spell checker below.
Input should be in uft8, otherwise non-ASCII characters will not be rendered correctly.
If you're looking for a test text, you can copy and paste this one:
- included Spanish lexicon from FreeLing:
→ words with Spanish roots can now be spell checked.
- included extended orthographic rules
→ e.g. 2.Sg.Subj written as -nqui will now be corrected to -nki
Llaqtaymanta hanpurani mana mamay taytay kaqtin; totalmente q'ara, wakcha, madrinaypa makinpi karani. Paymi chukchayta rutuwaran, hinaspa huk p'unchay hatunchaña kashaqtiy niwaq:
-Ñataq hallpayoqña kanki, tullu takyasqa, chayqa llank'aqmi rinayki.
Ricardo Valderrama Fernandez and Carmen Escalante Gutierrez. 1977. Gregorio Condori Mamani - Autobiografía. Biblioteca de la Tradición Oral Andina. Centro de Estudios Rurales Andinos Bartolomé de las Casas, Cuzco.