Resources for Evaluation of CEFRLex

Here, we present data we used to evaluate the English CEFRLex resource in our publication Using Multilingual Resources to Evaluate CEFRLex for Learner Applications.

For all three languages that we used in our evaluation (English, French, Swedish), we compiled lists based on the original CEFRLex resource:

Those lists comprise the lexical entry (word), the universal part of speech tag (pos), different CEFR levels ($C$, $C_1$, $C_5$, $C_{10}$ in the paper), the CEFR level assigned by the KELLY lists (if available), a flag (0 or 1) whether that entry is a hapax legomenon in the textbooks that CEFRLex is based upon1), symmetrized conditional translation probabilities ($p_{min}$, $p_{max}$ and $p_{avg}$ in the paper), the number of languages that those probabilities are based on, the number of languages with parallel KELLY entries and the two values measuring the difference of CEFR level values between the languages ($\delta$ and $\delta_\sigma$ in the paper).

Furthermore, we provide a list of bilingual and trilingual matches with conditional translation probabilities:

How to cite

GraenAlfterSchneider2020.bib
@InProceedings{GraenAlfterSchneider2020,
  author    = {Gra\"{e}n, Johannes  and  Alfter, David  and  Schneider, Gerold},
  title     = {Using Multilingual Resources to Evaluate CEFRLex for Learner Applications},
  booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference (LREC)},
  month     = {May},
  year      = {2020},
  address   = {Marseille, France},
  publisher = {European Language Resources Association (ELRA)},
  pages     = {346--355},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.43}
}
1)
not available for French

CL Wiki

Institute of Computational Linguistics – University of Zurich