in Spanish

Quechua Morphology

Simplified illustration of the finite-state transducer
for Quechua (click to enlarge)
(as PDF)

Comprehensive finite-state morphology systems have been developed for numerous languages, nevertheless the American indigenous languages have received far less attention from the computational linguistic field than the standard European languages. For my master thesis, I implemented a complete morphology system for the Andean language Quechua, consisting of a morphological analyzer, a generation tool and also a spell checker.
Thanks to Richard Castro Mamani from the Universidad Nacional de San Antonio Abad del Cusco, there is now a user friendly text editor with Quechua spell checker available (old version!).
We used our morphological analyzer to create a system for automatic text normalization, you can test it here.

Quechua is a group of closely related languages, spoken by 8-10 million people in Peru, Bolivia, Ecuador, Southern Colombia and the North-West of Argentina. Ethnologue also lists some Quechua speakers for Chile.
Quechua is one of the official languages of Peru and Bolivia. Peru especially, has increased efforts to provide its citizens with official information not only in Spanish, but also in Quechua and (to less extent) in some other indigenous languages like Aymara and Asháninka.

Although Quechua is often referred to as ’language’ and its local varieties as ’dialects’, Quechua represents a language family, comparable in depth to the Romance or Slavic languages (Adelaar & Muysken 2004). Mutual intelligibility, especially between speakers of distant ’dialects’, is not always given.

The Quechuan Languages are divided into two main branches, Quechua I and II in terms of the Peruvian linguist Torero (1964). Quechua I is the more archaic group of dialects, spoken in Central Peru. It comprises a heavily fragmented dialect complex, with limited mutual comprehension between the different local varieties, although they share a number of clear common features (Adelaar & Muysken 2004). The origin of the Quechuan languages lies probably in this area (Cerrón-Palomino 2003). Quechua II itself consists of three subgroups:

QIIA, spoken in Northern Peru
QIIB, spoken in Ecuador and Colombia
QIIC, spoken in Southern Peru, Bolivia, and Argentina

The main focus for this project lies on the dialects of the QIIC group, and within these, especially on Cuzco and Ayacucho Quechua.
The analyzer can be tested below. It is especially designed to analyze QIIC input, therefore it will not be able to analyze input of QI or other QII dialects.
The output is given in trivocalic ortography, but for the input, the vocals e and o are also accepted. Aspirated stops should be written as ph, th, kh, qh and chh , glottalized stops should be written with an apostrophe, e.g. q', k', etc.

This finite-state transducer was built with Xerox Finite-State Tools, its size is about 7 Mb.

If you don't speak Quechua, feel free to try a word from the Declaration of Human Rights below. Be careful that the input doesn't contain any whitespaces. For suggestions, please write to ariosATifi.uzh.ch

Sources:

Adelaar, W. F. H. and P. Muysken
2004. The Languages of the Andes, Cambridge Language Surveys. Cambridge University Press.
Cerrón-Palomino, R.
2003. Lingüística Quechua, 2. edition. Centro de Estudios Regionales Andinos Bartolomé de Las Casas (CBC).
Torero, A.
1964. Los dialectos quechuas. Anales Científicos de la Universidad Agraria, Lima, (IV):446–478.

Output:

Abbreviations used:
Abl = Ablative
Acc = Accusative
Add = Additive
Aff = Affective
Ag = Nomen Agentis
Amb = Ambivalent Suffix, attaches to both nominal and verbal word forms
Aprx = Approximative
AS = Ambivalent Suffix
Asmp = Assumptive
Amp_Emph = Assumptive Emphatic
Asp = Aspect Suffix
Ass = Assistive
Aug = Augmentative
Autotrs = Autotransformative
Ben = Benefactive
Cas = Case Suffix
Caus = Causative
Char = Characterization
Cis_Trs = Cis-/Translocative
Con = Connective Suffix
Cond = Condicional
Conec = Connective Postposition
Con_Inst = Connective/Instrumental
Cont = Continuity
Contr = Contrastive Postposition ("or")
Dat = Dative
Dat_Ill = Dative_Illative
Def = Definitiveness, Certainty
Dem = Demonstrative Prn.
DE = Direct Evidence
Des = Desiderative
Desesp = Desesperative
Dim = Diminutive
DirE = Direct Evidence
DirE_Emph = Direct Evidence, emphatic
Disc = Discontinuiative
Dist = Distributive
DS = Different Subject
Dub = Dubitative
Emph = Emphatic
Excl = Exclusive
Fact = Factitive
Fut = Future Tense
Gen = Genitive
Hon = Honorific/Affective
Inch = Inchoative
Incl = Inclusive
IndE = Indirect Evidence
IndE_Emph = Indirect Evidence, emphatic
Inf = Infinitive
Int = Intention
Inter = Interrogative
Intrup = Interruptive
Intsoc = Inter-Sociative
IPst = Past of Indirect Evidence
Kaus = Cause ("because of")
Kont = Continuitiy
Lim = Limitative ("just")
Loc = Locative
MPoss = Possessor of "a lot of"
NDeriv = Nominal Derivational Suffix (including case and possessive suffixes)
Neg = Negation
Neg_Emph = Negation, emphatic
Neg_Imp = Imperative Negation Particle
NP = Proper Noun (Nombre Propio)
NPst = Neutral Past
NPers = Nominal Person Suffix (Possessive)
NPoss = "not Possessor of", "without"
NRoot = Nominal Root
NRootES = Nominal Root of Spanish Origin
NRootNUM = Numeral Nominal Root
NRootCMP = Nominal Root in Compound
NS = Nominalizing Suffix
Num = Number
NumOrd = Ordinal Numeral
Mod = Modal Suffix
Obl = Obligation,Purpose
Part = Particle
PartES = Particle of Spanish Origin
Perdur = Perdurative
Perf = Perfect
Pl = Plural
Posi = Positional
Poss = Possessive
Pot = Potential
Prn = Pronoun
PrnInterr = Interrogative Pronoun
Prog = Progressive
Proloc = Prolocative
QTop = Topic in Question
Rel = Relational
Rflx = Reflexive
Rflx_Int = Reflexive/Intensifier
Rem = Rememorative
Res = Resignation,Implicitness
Reub = Reubicative
Rgr_Iprs= Regressive/Interpersonal
Rptn = Repentine, Precipitation, Unexpected Action
Rzpr = Reciprocal
Sg = Singular
Sim = Similarity
Sim_Disk = Simulative-Discontinuative
Sml = Simulative
SS = Same Subject
SS_Sim = Same Subject-Simultaneity
Soc = Sociative
Term = Terminative
Tns = Tense Suffix
Tns_Vpers = Portmanteau Tense Suffix containing a zero-marked 3.Person
Top = Topic
Trs = Transformative
Trs = Translocative
VDeriv = Verbal Derivational Suffix
VDim = Verbal Diminutive
VPers = Verbal Person Suffix
VRoot = Verbal Root
VRootES = Verbal Root of Spanish Origin
VRootCMP = Verbal Root with incorporated NRoot
VS = Verbalizing Suffix