Volltext-Downloads (blau) und Frontdoor-Views (grau)

CMC Corpora in DeReKo

  • We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Harald LüngenGND, Marc KupietzGND
URN:urn:nbn:de:bsz:mh39-62592
Parent Title (English):Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017
Publisher:Institut für Deutsche Sprache
Place of publication:Mannheim
Editor:Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick
Document Type:Conference Proceeding
Language:English
Year of first Publication:2017
Date of Publication (online):2017/07/05
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Deutsches Referenzkorpus (DeReKo); Dortmunder Chat-Korpus
CMC corpus; Computer-mediated communication; Corpus linguistics
GND Keyword:Deutsch; Internet; Korpus <Linguistik>; UseNet; Wikipedia
Pagenumber:5
First Page:20
Last Page:24
Dewey Decimal Classification:400 Sprache
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Conferences, Workshops:CMLC-5 + BigNLP / 5th Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing
Licence (German):License LogoCreative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland