Refine
Year of publication
- 2015 (4) (remove)
Document Type
- Conference Proceeding (3)
- Other (1)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4) (remove)
Keywords
- Annotation (4) (remove)
Publicationstate
- Veröffentlichungsversion (4) (remove)
Reviewstate
- Peer-Review (2)
- Review-Status-unbekannt (1)
Tagset und Richtlinie für das PoSTagging von Sprachdaten aus Genres internetbasierter Kommunikation
(2015)
Usenet is a large online resource containing user-generated messages (news articles) organised in discussion groups (newsgroups) which deal with a wide variety of different topics. We describe the download, conversion, and annotation of a comprehensive German news corpus for integration in DeReKo, the German Reference Corpus hosted at the Institut für Deutsche Sprache in Mannheim.
Contents:
1. Michal Křen: Recent Developments in the Czech National Corpus, S. 1
2. Dan Tufiş, Verginica Barbu Mititelu, Elena Irimia, Stefan Dumitrescu, Tiberiu Boros, Horia Nicolai Teodorescu: CoRoLa Starts Blooming – An update on the Reference Corpus of Contemporary Romanian Language, S. 5
3. Sebastian Buschjäger, Lukas Pfahler, Katharina Morik: Discovering Subtle Word Relations in Large German Corpora, S. 11
4. Johannes Graën, Simon Clematide: Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora, S. 15
5. Stefan Evert, Andrew Hardie: Ziggurat: A new data model and indexing format for large annotated text corpora, S. 21
6. Roland Schäfer: Processing and querying large web corpora with the COW14 architecture, S. 28
7. Jochen Tiepmar: Release of the MySQL-based implementation of the CTS protocol, S. 35