Recent developments in DeReKo
- This paper gives an overview of recent developments in the German Reference Corpus DeReKo in terms of growth, maximising relevant corpus strata, metadata, legal issues, and its current and future research interface. Due to the recent acquisition of new licenses, DeReKo has grown by a factor of four in the first half of 2014, mostly in the area of newspaper text, and presently contains over 24 billion word tokens. Other strata, like fictional texts, web corpora, in particular CMC texts, and spoken but conceptually written texts have also increased significantly. We report on the newly acquired corpora that led to the major increase, on the principles and strategies behind our corpus acquisition activities, and on our solutions for the emerging legal, organisational, and technical challenges.
Author: | Marc KupietzGND, Harald LüngenGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-31353 |
URL: | http://www.lrec-conf.org/proceedings/lrec2014/index.html |
Parent Title (English): | Proceedings of the ninth conference on international language resources and evaluation (LREC’14) |
Publisher: | European Language Resources Association (ELRA) |
Place of publication: | Reykjavik |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2014 |
Date of Publication (online): | 2014/10/13 |
Tag: | Deutsches Referenzkorpus (DeReKo); Institut für Deutsche Sprache <Mannheim> |
GND Keyword: | Deutsch; Korpus <Linguistik>; Textkorpus |
Page Number: | 2378 |
First Page: | 2385 |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Licence (German): | Urheberrechtlich geschützt |