This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
public:pacoco:text_berg [2019-07-18 22:23] – [Table] Johannes Graën | public:pacoco:text_berg [2019-07-22 00:56] – Johannes Graën | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ~~NOTOC~~ | ||
====== The Text+Berg Corpus ====== | ====== The Text+Berg Corpus ====== | ||
Line 17: | Line 18: | ||
The corpus has been divided into its language specific subsections. The table below provides an overview of corpus statistics for each subsection. | The corpus has been divided into its language specific subsections. The table below provides an overview of corpus statistics for each subsection. | ||
- | === SAC === | + | |
+ | ===== SAC ===== | ||
^ lang ^ tokens | ^ lang ^ tokens | ||
^ de | ^ de | ||
Line 27: | Line 29: | ||
^ Total ^ 38.6m ^ 1.1m ^ 429k ^ 2.1m ^ 21k ^ | ^ Total ^ 38.6m ^ 1.1m ^ 429k ^ 2.1m ^ 21k ^ | ||
- | ===EdA=== | + | ==== Alignment ==== |
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | The corpus has been aligned on the sentence level. |
- | | **fr** | + | |
- | ===BAC=== | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | ||
- | | **en** | ||
- | ------------------------------ | + | ===== EdA ===== |
+ | ^ lang ^ tokens | ||
+ | ^ fr | 7.4m | 185k | 40k | 376k | 4.5k | | ||
- | === Relevant links === | ||
+ | ===== BAC ===== | ||
+ | ^ lang ^ tokens | ||
+ | ^ en | 6.5m | 181k | 60k | 289k | 1.5k | | ||
- | * [[http:// | ||
- | * [[https:// | ||
- | === Publications === | + | ===== Publications |
* Detection and annotation of code-switching [[https:// | * Detection and annotation of code-switching [[https:// | ||
* Crowdsourced correction of OCR errors [[https:// | * Crowdsourced correction of OCR errors [[https:// | ||
Line 51: | Line 51: | ||
* special handling of elliptical compound nouns and separable prefix verbs in German [[https:// | * special handling of elliptical compound nouns and separable prefix verbs in German [[https:// | ||
* See here for more [[http:// | * See here for more [[http:// | ||
+ | |||
+ | |||
+ | ===== Relevant links ===== | ||
+ | |||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ |