Table of Contents


This document briefly describes what are OGER APIs, and how to use them, in a few simple steps.

OGER is a fast, accurate entity annotation tool, which is accessible either as a software package, or as a web service. Given text as input, it delivers annotations as output, as illustrated in the picture below.


The dictionaries used for annotation are obtained from major life science databases (cellosaurus, cell ontology, ChEBI, CTD, EntrezGene, Gene Ontology, MeSH, Molecular Process Ontology, NCBI Taxonomy, Protein Ontology, RxNorm, Sequence Ontology, Swiss-Prot, Uberon).

OGER dictionaries are sourced and kept synchronized with the original databases through usage of our own Bio Term Hub.

OGER APIs are Web service APIs (type REST) that allows easy access to online OGER annotation capabilities.

The APIs’ information

With the APIs it is possible to:

  • Check the service status
  • Create new dictionaries
    • Check their status
  • Annotate
    • Documents that are in the local machine (upload)
    • Documents fetched from PubMed (fetch)

Below there is a minimal example to perform an online annotation.

Step 1

Verify that the service is up and running.

curl --location --request GET 'https://pub.cl.uzh.ch/projects/ontogene/oger/status'

This should respond something similar to:

    "status": "running",
    "active annotation dictionaries": 2,
    "default dictionary": "509f822aaf527390"

From the response it can be see that the default dictionary is the one identified with the hextoken 509f822aaf527390. This is the dictionary that would be used for the annotation if no other is explicitly passed within the annotation request.

Step 2

Verify the availability and status of the dictionary that you want to use.

In this example we are not going to use the default dictionary but another one fitted for COVID19 literature. This dictionary is already available and its hexacode is 799a6414c37b2d1a.

To check status and description by running:

curl --location --request GET 'https://pub.cl.uzh.ch/projects/ontogene/oger/dict/799a6414c37b2d1a/status'

Which would respond something similar to:

    "status": "ready",
    "description": "default+COVID terminology"

So, we are ready to go.

It is also possible to request a new dictionary based on specific settings, find out more in https://github.com/OntoGene/OGER/wiki/REST-API.

Step 3

Request the annotation.

We can request the annotation of local data by doing a POST to the /upload endpoint and passing the route parameters that specify the input and output format.

Below there is an example of the request. In this example the uploaded data is raw text (txt), the requested output format is a tabular table (tsv) and the text to be annotated is passed in the POST payload.

curl --location \
--request POST 'https://pub.cl.uzh.ch/projects/ontogene/oger/upload/txt/tsv' \
--header 'Content-Type: text/plain' \
--data-raw 'The initial cases of novel coronavirus (2019-nCoV)-infected 
pneumonia (NCIP) occurred in Wuhan, Hubei Province, China, in December 2019 
and January 2020.
We analyzed data on the first 425 confirmed cases in Wuhan to
determine the epidemiologic characteristics of NCIP.We collected
information on demographic characteristics, exposure history, and
illness timelines of laboratory-confirmed cases of NCIP that had been
reported by January 22, 2020.'

Find below a response example:


If we do the same request but change the output format from tsv to bioc_json, we get the following response:


Input formats

Format Content-type Description
txt text/plain unstructured plain-text document
bioc text/xml document or collection in BioC XML
bioc_json application/json document or collection in BioC JSON
pxml text/xml abstract in PubMed's citation XML
nxml text/xml article in PubMed Central's full-text XML
pxml.gz application/gzip compressed collection of abstracts in Medline's citation XML

Output formats

Format Content-type Description
tsv text/tab-separated-values entities in a tab-separated table
xml text/xml entities in a simple, self-explanatory XML format
text_tsv text/tab-separated-values text and entities in a tab-separated table
bioc text/xml text and entities in BioC XML
bioc_json application/json text and entities in BioC JSON
pubanno_json application/json text and entities in PubAnnotator JSON
pubtator text/plain text and entities in PubTator format (mixture of pipe- and tab-separated text)
pubtator_fbk text/plain a variant of the above, with slightly different entity attributes
odin text/xml text and entities in ODIN XML
odin_custom text/xml text and entities in ODIN XML, with customisable CSS

More information about the annotation parameters is available at https://github.com/OntoGene/OGER/wiki/REST-API

Documentation on the annotation fields

If you use the tsv format as your output format, the resulting output will contain one line for every discovered term. Each line contains the following tab-separated fields:

  1. paper identifier (pubmed id)
  2. type of entity
  3. start offset
  4. end offset
  5. term as found in the paper
  6. preferred term
  7. term id
  8. section of the paper where term has been found
  9. sentence ID
  10. name of source vocabulary / database
  11. common term identifier

Notice that

  • (7) is the unique identifier of the term as from the corresponding reference database, which is indicated by (10).
  • (6) is a term semantically equivalent to the one found in the paper (5), but which is indicated in the reference database as the "best" or normative term for the corresponding concept
  • (11) is a common identifier, in cases where we have one. In most applications we try to provide an UMLS CUI at this position.

Further notice that the same span of text might generate multiple annotations. This is the case when a term is ambiguos, so that several potential annotations can be applied to it.

Who are we?

This page is maintained by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich.

For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.

Go back to main page

Author: Fabio Rinaldi

Created: 2020-08-07 Fri 23:09