This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:pacoco:text_berg [2019-07-17 22:16] – tkew | public:pacoco:text_berg [2023-09-15 20:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ~~NOTOC~~ | ||
====== The Text+Berg Corpus ====== | ====== The Text+Berg Corpus ====== | ||
Line 17: | Line 18: | ||
The corpus has been divided into its language specific subsections. The table below provides an overview of corpus statistics for each subsection. | The corpus has been divided into its language specific subsections. The table below provides an overview of corpus statistics for each subsection. | ||
- | === SAC === | ||
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | ||
- | | **de** | ||
- | | **fr** | ||
- | | **it** | ||
- | | **rm** | ||
- | | **gsw** | ||
- | | **en** | ||
- | ^ Total ^ 38.6m ^ 1.1m ^ 429k ^ 2.1m ^ 21k ^ | ||
- | ===EdA=== | + | ===== SAC ===== |
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | ^ lang |
- | | **fr** | + | ^ de | 23.4m | 769k | |
+ | ^ fr | ||
+ | ^ it | ||
+ | ^ rm | ||
+ | ^ gsw | 3k | 1.3k | 0.2k | 156 | 3 | | ||
+ | ^ en | ||
+ | ^ Total ^ 38.6m ^ 1.1m ^ 429k ^ 2.1m ^ 21k ^ | ||
- | ===BAC=== | + | ==== Alignment ==== |
- | ^ lang ^ tokens ^ types ^ lemmas ^ sents ^ texts ^ | + | The corpus has been aligned on the sentence level. |
- | | **en** | + | |
- | ------------------------------ | ||
- | === Relevant links === | + | ===== EdA ===== |
+ | ^ lang ^ tokens | ||
+ | ^ fr | 7.4m | 185k | 40k | 376k | 4.5k | | ||
- | * [[http:// | + | ===== BAC ===== |
- | | + | ^ lang ^ tokens |
+ | ^ en | 6.5m | 181k | 60k | 289k | 1.5k | | ||
- | === Publications === | + | |
+ | ===== Publications | ||
* Detection and annotation of code-switching [[https:// | * Detection and annotation of code-switching [[https:// | ||
* Crowdsourced correction of OCR errors [[https:// | * Crowdsourced correction of OCR errors [[https:// | ||
Line 51: | Line 51: | ||
* special handling of elliptical compound nouns and separable prefix verbs in German [[https:// | * special handling of elliptical compound nouns and separable prefix verbs in German [[https:// | ||
* See here for more [[http:// | * See here for more [[http:// | ||
+ | |||
+ | |||
+ | ===== Relevant links ===== | ||
+ | |||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ |