Volltext-Downloads (blau) und Frontdoor-Views (grau)

Recent developments in DeReKo

  • This paper gives an overview of recent developments in the German Reference Corpus DeReKo in terms of growth, maximising relevant corpus strata, metadata, legal issues, and its current and future research interface. Due to the recent acquisition of new licenses, DeReKo has grown by a factor of four in the first half of 2014, mostly in the area of newspaper text, and presently contains over 24 billion word tokens. Other strata, like fictional texts, web corpora, in particular CMC texts, and spoken but conceptually written texts have also increased significantly. We report on the newly acquired corpora that led to the major increase, on the principles and strategies behind our corpus acquisition activities, and on our solutions for the emerging legal, organisational, and technical challenges.

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Marc KupietzGND, Harald LüngenGND
Parent Title (English):Proceedings of the ninth conference on international language resources and evaluation (LREC’14)
Publisher:European Language Resources Association (ELRA)
Place of publication:Reykjavik
Document Type:Conference Proceeding
Year of first Publication:2014
Date of Publication (online):2014/10/13
Tag:Deutsches Referenzkorpus (DeReKo); Institut für Deutsche Sprache <Mannheim>
GND Keyword:Deutsch; Korpus <Linguistik>; Textkorpus
Page Number:2378
First Page:2385
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Licence (German):License LogoUrheberrechtlich geschützt