Volltext-Downloads (blau) und Frontdoor-Views (grau)

CMC Corpora in DeReKo

  • We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Harald LüngenGND, Marc KupietzGND
Parent Title (English):Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017
Publisher:Institut für Deutsche Sprache
Place of publication:Mannheim
Editor:Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick
Document Type:Conference Proceeding
Year of first Publication:2017
Date of Publication (online):2017/07/05
Tag:Deutsches Referenzkorpus (DeReKo); Dortmunder Chat-Korpus
CMC corpus; Computer-mediated communication; Corpus linguistics
GND Keyword:Deutsch; Internet; Korpus <Linguistik>; UseNet; Wikipedia
Page Number:5
First Page:20
Last Page:24
DDC classes:400 Sprache
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Conferences, Workshops:CMLC-5 + BigNLP / 5th Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing
Licence (German):License LogoCreative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland