TY - CHAP U1 - Buchbeitrag A1 - Lüngen, Harald A1 - Kupietz, Marc ED - Bański, Piotr ED - Kupietz, Marc ED - Lüngen, Harald ED - Rayson, Paul ED - Biber, Hanno ED - Breiteneder, Evelyn ED - Clematide, Simon ED - Mariani, John ED - Stevenson, Mark ED - Sick, Theresa T1 - CMC Corpora in DeReKo T2 - Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017 N2 - We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively. KW - Korpus KW - Deutsch KW - Internet KW - Wikipedia KW - UseNet KW - Deutsches Referenzkorpus (DeReKo) KW - Dortmunder Chat-Korpus KW - Corpus linguistics KW - Computer-mediated communication KW - CMC corpus Y1 - 2017 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-62592 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-62592 SP - 20 EP - 24 S1 - 5 PB - Institut für Deutsche Sprache CY - Mannheim ER -