Volltext-Downloads (blau) und Frontdoor-Views (grau)

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

  • We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Harald LüngenGND, Michael Beißwenger, Eric Ehrhardt, Axel Herold, Angelika Storrer
URN:urn:nbn:de:bsz:mh39-55743
URL:https://www.linguistics.ruhr-uni-bochum.de/bla/
ISSN:2190-0949
Parent Title (English):Proceedings of the 13th Conference on Natural Language Processing (KONVENS)
Series (Serial Number):Bochumer Linguistische Arbeitsberichte (16)
Publisher:Sprachwissenschaftliches Institut, Ruhr-Universität Bochum
Place of publication:Bochum
Editor:Stefanie Dipper, Friedrich Neubarth, Heike Zinsmeister
Document Type:Part of a Book
Language:English
Year of first Publication:2016
Date of Publication (online):2016/11/16
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Corpora:http://hdl.handle.net/10932/00-03B0-14FA-A8D0-0F01-F
GND Keyword:Chatten <Kommunikation>; Deutsch; Korpus <Linguistik>; Text Encoding Initiative (TEI)
First Page:156
Last Page:164
Dewey Decimal Classification:400 Sprache / 400 Sprache, Linguistik
BDSL-Classification:Textwissenschaft
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG