CMC Corpora in DeReKo
- We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.
Author: | Harald LüngenGND, Marc KupietzGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-62592 |
Parent Title (English): | Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017 |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2017/07/05 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Deutsches Referenzkorpus (DeReKo); Dortmunder Chat-Korpus CMC corpus; Computer-mediated communication; Corpus linguistics |
GND Keyword: | Deutsch; Internet; Korpus <Linguistik>; UseNet; Wikipedia |
Page Number: | 5 |
First Page: | 20 |
Last Page: | 24 |
DDC classes: | 400 Sprache |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-5 + BigNLP / 5th Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing |
Program areas: | Digitale Sprachwissenschaft |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |