French Part-of-speech Tag Set

For tagging the French part of the Credit Suisse corpus we have made some modifications to the tag set used in Text+Berg corpus which was created
by training the TreeTagger on the French Le Monde treebank (see Abeillé et al. 2003).

Anne Göhring, Martin Volk; 8. November 2010

PoS tag part-of-speech subcategory examples
       
A adjective  
A_card   cardinal cent, quatorze
A_ind   indefinite certain, demi, même, plusieurs
A_ord   ordinal premiers, quinzième
A_qual   qualificative les tendence antimondialistes
A_card   cardinal 1200 mètres
       
ADV adverb   aujourd'hui, heureusement, au-delà
ADV_excl   exclamative commbien!
ADV_int   interrogative autour, comment, presqu'
       
CL clitic  
CL_suj   clitic subject elle, il, on (-t-on)
CL_obj   clitic object il vous suffit
CL_refl   clitic reflexive particle il s' agit
       
C conjunction  
C_C   coordination car, et, ou, mais
C_S   subordination lorsque, quoique, comment
       
D determiner  
D_card   cardinal trois (the only case!)
D_def   definite l'attention sur les rochers
D_dem   demonstrative ce, cette
D_ind   indefinite divers, aucun, tout, un
D_part   partitive du (the only case!)
D_poss   possessive son long trajet
       
ET foreign material   Vivant amici montium
       
I interjection   salut, pardon
       
N noun  
N_card   cardinal deux
N_C   common escalade, itinéraires
N_P   proper Ackermann, Europe
       
P preposition   vers, sur, en
       
PCT punctuation  
PCT_S   strong ? ! .
PCT_W   weak << ^
       
PREF prefix   ultra, quasi
       
PRO pronoun  
PRO_card   cardinal quarante, six
PRO_dem   demonstrative celui-ci, ceux
PRO_ind   indefinite quelqu'un, chacun
PRO_int   interrogative qui, que, quoi
PRO_poss   possessive nôtre, tien
PRO_rel   relative dont, lequel, que
       
V verb   continuons, grimper

* The PoS tags that are marked with an asterisk are the result of inconsistencies in the training corpus (based on omissions of the PoS subcategory).

Reference

Anne Abeillé, Lionel Clément and Francois Toussenel (2003). Building a Treebank for French. In: Building and Using Parsed Corpora. Text, Speech and Language Technology. 20(10), p.165-187, Kluwer, Dordrecht.