Alignment: Definition

  • The term alignment denotes different concepts.
  • It is applicable to parallel corpora (two languages) or multiparallel corpora (more than two languages).

“Alignment” refers to (1)

… the process of identifying corresponding
elements in two or more languages:

  • “Alignment is usually done by finding correspondence points […]” (Ribeiro et al. 2000)
  • “Alignment—the explicit linking of items in the SL [source language] and TL [target language] texts judged to correspond to each other—is a prerequisite for the extraction of translation equivalents from parallel corpora […]” (Borin 2000)
  • “Central to the work with bitexts is the task of alignment – the process of linking corresponding parts with each other.” (Tiedemann 2011)

“Alignment” refers to (2)

… the method of searching correspondences:

  • document/text alignment (documents)
  • sentence alignment (sentences)
  • word alignment (tokens)
  • sub-sentential alignment (multiword units)

These methods are generic and can be subdivided
into different techniques.

“Alignment” refers to (3)

… the marked correspondence of one or more structural elements (e.g., documents, articles, paragraphs, sentences, phrases, tokens).

  • Wir möchten nicht die Katze im Sack kaufen.
  • Nous ne voulons pas acheter chat en poche.

In this case, we use the term alignment unit.
Alignment units are sets of the elements in question:

{Katze, cat}

“Alignment” refers to (4)

… a list of alignment units of a given higher-level element (e.g., words in sentences).

  • Wir möchten nicht die Katze im Sack kaufen.
  • Nous ne voulons pas acheter chat en poche.

In this case, we use the term alignment set.
Alignment sets are sets of alignment units
(i.e., sets of sets of the elements in question):

{ {Wir, Nous}, {Katze, chat}, {nicht, ne, pas}, {im, en}, {möchten, voulons}, {Sack, poche}, {kaufen, acheter} }

Applications based on word alignment