Quechua Morphology
Comprehensive finite-state morphology systems have been developed for numerous languages, nevertheless
the American indigenous languages have received far less attention from the computational linguistic
field than the standard European languages. For my master thesis, I implemented a complete morphology
system for the Andean language Quechua, consisting of a morphological analyzer, a generation tool and also
a spell checker.
Thanks to Richard Castro Mamani from the Universidad Nacional de San Antonio Abad del Cusco, there is now a user friendly text editor with Quechua spell checker available (old version!).
We used our morphological analyzer to create a system for automatic text normalization, you can test it here.
Quechua is a group of closely related languages, spoken by 8-10 million people
in Peru, Bolivia, Ecuador, Southern Colombia and the North-West of Argentina.
Ethnologue also lists some Quechua speakers
for Chile.
Quechua is one of the official languages of Peru and Bolivia.
Peru especially, has increased efforts to provide its citizens with official
information not only in Spanish, but also in Quechua and (to less extent) in some other indigenous
languages like Aymara and Asháninka.
Although Quechua is often referred to as ’language’ and its local varieties as
’dialects’, Quechua represents a language family, comparable in depth to the Romance or Slavic languages (Adelaar & Muysken 2004).
Mutual intelligibility, especially between speakers of distant ’dialects’,
is not always given.
The Quechuan Languages are divided into two main branches, Quechua I and II
in terms of the Peruvian linguist Torero (1964).
Quechua I is the more archaic group of dialects, spoken in Central Peru. It comprises a heavily
fragmented dialect complex, with limited mutual comprehension between the different local varieties,
although they share a number of clear common features (Adelaar & Muysken 2004). The origin of
the Quechuan languages lies probably in this area (Cerrón-Palomino 2003).
Quechua II itself consists of three subgroups:
- QIIA, spoken in Northern Peru
- QIIB, spoken in Ecuador and Colombia
- QIIC, spoken in Southern Peru, Bolivia, and Argentina
|
The main focus for this project lies on the dialects of the QIIC group, and within these,
especially on Cuzco and Ayacucho Quechua.
The analyzer can be tested below. It is especially designed to analyze QIIC input, therefore it will
not be able to analyze input of QI or other QII dialects.
The output is given in trivocalic ortography, but for the input, the vocals e and o
are also accepted. Aspirated stops should be written as ph, th, kh, qh and chh ,
glottalized stops should be written with an apostrophe, e.g. q', k', etc.
This finite-state transducer was built with Xerox Finite-State Tools, its size is about 7 Mb.
If you don't speak Quechua, feel free to try a word from the Declaration of Human Rights below. Be careful that the input doesn't contain any whitespaces.
For suggestions, please write to ariosATifi.uzh.ch
Sources:
- Adelaar, W. F. H. and P. Muysken
2004. The Languages of the Andes, Cambridge Language Surveys. Cambridge
University Press.
- Cerrón-Palomino, R.
2003. Lingüística Quechua, 2. edition. Centro de Estudios Regionales Andinos
Bartolomé de Las Casas (CBC).
- Torero, A.
1964. Los dialectos quechuas. Anales Científicos de la Universidad Agraria,
Lima, (IV):446–478.
Output:
Abbreviations used: |
Abl = Ablative |
Acc = Accusative |
Add = Additive |
Aff = Affective |
Ag = Nomen Agentis |
Amb = Ambivalent Suffix, attaches to both nominal and verbal word forms |
Aprx = Approximative |
AS = Ambivalent Suffix |
Asmp = Assumptive |
Amp_Emph = Assumptive Emphatic |
Asp = Aspect Suffix |
Ass = Assistive |
Aug = Augmentative |
Autotrs = Autotransformative |
Ben = Benefactive |
Cas = Case Suffix |
Caus = Causative |
Char = Characterization |
Cis_Trs = Cis-/Translocative |
Con = Connective Suffix |
Cond = Condicional |
Conec = Connective Postposition |
Con_Inst = Connective/Instrumental |
Cont = Continuity |
Contr = Contrastive Postposition ("or") |
Dat = Dative |
Dat_Ill = Dative_Illative |
Def = Definitiveness, Certainty |
Dem = Demonstrative Prn. |
DE = Direct Evidence |
Des = Desiderative |
Desesp = Desesperative |
Dim = Diminutive |
DirE = Direct Evidence |
DirE_Emph = Direct Evidence, emphatic |
Disc = Discontinuiative |
Dist = Distributive |
DS = Different Subject |
Dub = Dubitative |
Emph = Emphatic |
Excl = Exclusive |
Fact = Factitive |
Fut = Future Tense |
Gen = Genitive |
Hon = Honorific/Affective |
Inch = Inchoative |
Incl = Inclusive |
IndE = Indirect Evidence |
IndE_Emph = Indirect Evidence, emphatic |
Inf = Infinitive |
Int = Intention |
Inter = Interrogative |
Intrup = Interruptive |
Intsoc = Inter-Sociative |
IPst = Past of Indirect Evidence |
Kaus = Cause ("because of") |
Kont = Continuitiy |
Lim = Limitative ("just") |
Loc = Locative |
MPoss = Possessor of "a lot of" |
NDeriv = Nominal Derivational Suffix (including case and possessive suffixes) |
Neg = Negation |
Neg_Emph = Negation, emphatic |
Neg_Imp = Imperative Negation Particle |
NP = Proper Noun (Nombre Propio) |
NPst = Neutral Past |
NPers = Nominal Person Suffix (Possessive) |
NPoss = "not Possessor of", "without" |
NRoot = Nominal Root |
NRootES = Nominal Root of Spanish Origin |
NRootNUM = Numeral Nominal Root |
NRootCMP = Nominal Root in Compound |
NS = Nominalizing Suffix |
Num = Number |
NumOrd = Ordinal Numeral |
Mod = Modal Suffix |
Obl = Obligation,Purpose |
Part = Particle |
PartES = Particle of Spanish Origin |
Perdur = Perdurative |
Perf = Perfect |
Pl = Plural |
Posi = Positional |
Poss = Possessive |
Pot = Potential |
Prn = Pronoun |
PrnInterr = Interrogative Pronoun |
Prog = Progressive |
Proloc = Prolocative |
QTop = Topic in Question |
Rel = Relational |
Rflx = Reflexive |
Rflx_Int = Reflexive/Intensifier |
Rem = Rememorative |
Res = Resignation,Implicitness |
Reub = Reubicative |
Rgr_Iprs= Regressive/Interpersonal |
Rptn = Repentine, Precipitation, Unexpected Action |
Rzpr = Reciprocal |
Sg = Singular |
Sim = Similarity |
Sim_Disk = Simulative-Discontinuative |
Sml = Simulative |
SS = Same Subject |
SS_Sim = Same Subject-Simultaneity |
Soc = Sociative |
Term = Terminative |
Tns = Tense Suffix |
Tns_Vpers = Portmanteau Tense Suffix containing a zero-marked 3.Person |
Top = Topic |
Trs = Transformative |
Trs = Translocative |
VDeriv = Verbal Derivational Suffix |
VDim = Verbal Diminutive |
VPers = Verbal Person Suffix |
VRoot = Verbal Root |
VRootES = Verbal Root of Spanish Origin |
VRootCMP = Verbal Root with incorporated NRoot |
VS = Verbalizing Suffix |
|
|