Computational Linguistics for COVID-19 !

Table of Contents

Literature Based Discovery

Activities

Our goal is to enrich a given corpus of COVID-related biomedical literature with biomedical entities.

Our annotation process is based on an efficient dictionary-based lookup (OGER) combined with a deep learning approach trained on existing corpora, for example the CRAFT corpus of the University of Colorado.

Our terminologies are derived from the major life science databases using our Bio Term Hub, which allows us to maintain up-to-date dictionaries synchronized with the original resources.

Our current annotation pipeline generates annotations for several entity types:

  • cell lines
  • clinical drugs (RxNorm)
  • cells
  • molecular processes
  • sequences
  • organ/tissue
  • chemicals
  • Gene Ontology (GO)
  • organisms
  • proteins

Timeline

  • [2020-04-21 Tue] We have updated our annotated LitCovid dataset (now containing 5630 abstracts)
  • [2020-04-16 Thu] We have completed the annotation of the PMC subset of Litcovid. Find it here.
  • [2020-04-08 Wed] Our online annotation platform OGER now includes COVID specific terminology. You can also use it as a web service, try it out!
  • [2020-04-08 Wed] Our OGER+BioBERT annotations are now accessible on a local brat installation.

    See a screenshot below: LitCovid-Brat.png

    Click here to access the annotated documents

  • [2020-04-06 Mon] We have submitted our (improved) OGER+BioBERT annotations of the LitCovid dataset to Europe PMC.
  • [2020-04-03 Fri] We have annotated the LitCovid dataset with OGER+BioBERT and published our results on PubAnnotation, a tool developed by DBCLS, Tokyo (group Jin-Dong Kim):

Datasets

We have been working with two recently released datasets.

  • LitCovid, a set of more than 3000 abstracts, released by the National Libray of Medicine. They are categorized by different research topics and geographic locations.
  • CORD-19, a set of about 40000 documents. Made available by Allen Institute For AI as dataset for their CORD-19 challenge.

Our annotated datasets

  • Annotated corpora.
    • We have processed the LitCovid corpus with our entity recognition tools. Click here for details and downloads, in different formats.
    • Only a few of the abstracts contained in LitCovid have also a full text accessible from PubMed Central. We have processed this subset of full text paper (which we refer to as LitCovid/PMC). The results are available here.

Who are we?

This page is maintained by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich.

For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.

Go back to main page

Author: Fabio Rinaldi

Created: 2020-04-21 Tue 16:28

Validate