Korpuslinguistik
Refine
Year of publication
- 2018 (4) (remove)
Document Type
- Conference Proceeding (2)
- Article (1)
- Part of a Book (1)
Language
- English (4) (remove)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Keywords
- Korpus <Linguistik> (3)
- Data Mining (1)
- Deutsch (1)
- Digital Humanities (1)
- Europa (1)
- Google Ngram Corpora (1)
- Kontrastive Linguistik (1)
- Thematische Relation (1)
- Zipf–Mandelbrot law (1)
- Zipf’s law (1)
Publicationstate
- Zweitveröffentlichung (4) (remove)
Reviewstate
- Peer-Review (4)
Publisher
- de Gruyter (2)
- Clarin (1)
- Université catholique de Louvain (1)
This presentation introduces a new collaborative project: the International Comparable Corpus (ICC) (https://korpus.cz/icc), to be compiled from European national, standard(ised) languages, using the protocols for text categories and their quantities of texts in the International Corpus of English (ICE).
New exceptions for Text and Data Mining and their possible impact on the CLARIN infrastructure
(2018)
The proposed paper discusses new exceptions for Text and Data Mining that have recently been adopted in some EU Member States, and probably will soon be adopted also at the EU level. These exceptions are of great significance for language scientists, as they exempt those who compile corpora from the obligation to obtain authorisation from rightholders. However, corpora compiled on the basis of such exceptions cannot be freely shared, which in a long run may have serious consequences for Open Science and the functioning of research infrastructure such as CLARIN ERIC.
Using the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.