Refine
Year of publication
- 2016 (8) (remove)
Document Type
- Conference Proceeding (5)
- Part of a Book (2)
- Article (1)
Has Fulltext
- yes (8)
Is part of the Bibliography
- no (8)
Keywords
- Korpus <Linguistik> (6)
- Chatten <Kommunikation> (4)
- Deutsch (4)
- Computerunterstützte Kommunikation (2)
- Text Encoding Initiative (TEI) (2)
- CLARIN-D (1)
- CMC corpora (1)
- Computerunterstützte Lexikographie (1)
- Internet (1)
- Klassifikation (1)
Publicationstate
Reviewstate
- Peer-Review (6)
- (Verlags)-Lektorat (1)
Publisher
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.
We present an empirical study addressing the question whether, and to which extent, lexicographic writing aids improve text revision results. German university students were asked to optimise two German texts using (1) no aids at all, (2) highlighted problems, or (3) highlighted problems accompanied by lexicographic resources that could be used to solve the specific problems. We found that participants from the third group corrected the largest number of problems and introduced the fewest semantic distortions during revision. Also, they reached the highest overall score and were most efficient (as measured in points per time). The second group with highlighted problems lies between the two other groups in almost every measure we analysed. We discuss these findings in the scope of intelligent writing environments, the effectiveness of writing aids in practical usage situations and teaching dictionary skills.
Internetwörterbücher können viele Informationstypen auf neuartige Weise vereinigen und nutzeradaptiv präsentieren. Sie bilden in vernetzter Form als „Megawörterbücher“ große Wörterbuchportale und verschmelzen mit Korpora, multimedialen Erweiterungen und automatischen Sprachanalysetools zu Wortschatzinformationssystemen neuer Art. Es ist daher schwierig geworden, zwischen einen Wörterbuch einem Korpus, einem Atlas und einer Frequenzliste zu unterscheiden. Die Autoren versuchen, ein wenig Licht in das Dunkel der verschiedenen Typen von Wörterbüchern, Wörterbuchportalen und Wortschatzinformationssystemen zu bringen, und dabei auch zeigen, dass sich die Unordnung, die eine „Schlöraffe“ in die Klassifikation des Tierreichs bringt, am Ende durchaus auszahlen kann.