Building and Annotating a Corpus of German-Language Newsgroups
- Usenet is a large online resource containing user-generated messages (news articles) organised in discussion groups (newsgroups) which deal with a wide variety of different topics. We describe the download, conversion, and annotation of a comprehensive German news corpus for integration in DeReKo, the German Reference Corpus hosted at the Institut für Deutsche Sprache in Mannheim.
Author: | Jasmin Schröck, Harald LüngenGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-43640 |
Parent Title (English): | NLP4CMC 2015. 2nd Workshop on Natural Language Processing for Computer-Mediated Communication / Social Media. Proceedings of the Workshop , September 29, 2015 University of Duisburg-Essen, Campus Essen |
Publisher: | German Society for Computational Linguistics & Language Technology (GSCL) |
Editor: | Michael Beißwenger, Torsten Zesch |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2015 |
Date of Publication (online): | 2015/11/12 |
Publicationstate: | Veröffentlichungsversion |
Tag: | Deutsches Referenzkorpus (DeReKo); Textkorpus |
GND Keyword: | Annotation; Korpus <Linguistik> |
First Page: | 17 |
Last Page: | 22 |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Licence (German): | ![]() |