For tagging the French part of the Credit Suisse corpus we have made some modifications to the tag set used in Text+Berg corpus which was created
by
training the TreeTagger on the French Le Monde treebank (see
Abeillé et al. 2003).
Anne Göhring, Martin Volk; 8. November 2010
PoS tag | part-of-speech | subcategory | examples |
---|---|---|---|
A | adjective | ||
A_card | cardinal | cent, quatorze | |
A_ind | indefinite | certain, demi, même, plusieurs | |
A_ord | ordinal | premiers, quinzième | |
A_qual | qualificative | les tendence antimondialistes | |
A_card | cardinal | 1200 mètres | |
ADV | adverb | aujourd'hui, heureusement, au-delà | |
ADV_excl | exclamative | commbien! | |
ADV_int | interrogative | autour, comment, presqu' | |
CL | clitic | ||
CL_suj | clitic subject | elle, il, on (-t-on) | |
CL_obj | clitic object | il vous suffit | |
CL_refl | clitic reflexive particle | il s' agit | |
C | conjunction | ||
C_C | coordination | car, et, ou, mais | |
C_S | subordination | lorsque, quoique, comment | |
D | determiner | ||
D_card | cardinal | trois (the only case!) | |
D_def | definite | l'attention sur les rochers | |
D_dem | demonstrative | ce, cette | |
D_ind | indefinite | divers, aucun, tout, un | |
D_part | partitive | du (the only case!) | |
D_poss | possessive | son long trajet | |
ET | foreign material | Vivant amici montium | |
I | interjection | salut, pardon | |
N | noun | ||
N_card | cardinal | deux | |
N_C | common | escalade, itinéraires | |
N_P | proper | Ackermann, Europe | |
P | preposition | vers, sur, en | |
PCT | punctuation | ||
PCT_S | strong | ? ! . | |
PCT_W | weak | << ^ | |
PREF | prefix | ultra, quasi | |
PRO | pronoun | ||
PRO_card | cardinal | quarante, six | |
PRO_dem | demonstrative | celui-ci, ceux | |
PRO_ind | indefinite | quelqu'un, chacun | |
PRO_int | interrogative | qui, que, quoi | |
PRO_poss | possessive | nôtre, tien | |
PRO_rel | relative | dont, lequel, que | |
V | verb | continuons, grimper |
* The PoS tags that are marked with an asterisk are the result of inconsistencies in the training corpus (based on omissions of the PoS subcategory).
Anne Abeillé, Lionel Clément and Francois Toussenel (2003).
Building a Treebank for French. In: Building and Using Parsed
Corpora. Text, Speech and Language Technology. 20(10), p.165-187,
Kluwer, Dordrecht.