OPUS 4 | Search

8 search hits

1 to 8

Sort by

Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects (2016)

Beißwenger, Michael ; Chanier, Thierry ; Chiari, Isabella ; Erjavec, Tomaž ; Fišer, Darja ; Herold, Axel ; Ljubešić, Nikola ; Lüngen, Harald ; Poudat, Céline ; Stemle, Egon W. ; Storrer, Angelika ; Wigham, Ciara

The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.

Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects (2016)

(Best) Practices for Annotating and Representing CMC and Social Media Corpora in CLARIN-D (2016)

Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.

Das Dortmunder Chat-Korpus in CLARIN-D: Modellierung und Mehrwerte (2016)

Beißwenger, Michael ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN (2016)

Lüngen, Harald ; Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Storrer, Angelika

We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure.

Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D (2016)

Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.

The effectiveness of lexicographic tools for optimising written L1-texts (2016)

Wolfer, Sascha ; Bartz, Thomas ; Weber, Tassja ; Abel, Andrea ; Meyer, Christian M. ; Müller-Spitzer, Carolin ; Storrer, Angelika

We present an empirical study addressing the question whether, and to which extent, lexicographic writing aids improve text revision results. German university students were asked to optimise two German texts using (1) no aids at all, (2) highlighted problems, or (3) highlighted problems accompanied by lexicographic resources that could be used to solve the specific problems. We found that participants from the third group corrected the largest number of problems and introduced the fewest semantic distortions during revision. Also, they reached the highest overall score and were most efficient (as measured in points per time). The second group with highlighted problems lies between the two other groups in almost every measure we analysed. We discuss these findings in the scope of intelligent writing environments, the effectiveness of writing aids in practical usage situations and teaching dictionary skills.

Typologie von Internetwörterbüchern und -portalen (2016)

Engelberg, Stefan ; Storrer, Angelika

Internetwörterbücher können viele Informationstypen auf neuartige Weise vereinigen und nutzeradaptiv präsentieren. Sie bilden in vernetzter Form als „Megawörterbücher“ große Wörterbuchportale und verschmelzen mit Korpora, multimedialen Erweiterungen und automatischen Sprachanalysetools zu Wortschatzinformationssystemen neuer Art. Es ist daher schwierig geworden, zwischen einen Wörterbuch einem Korpus, einem Atlas und einer Frequenzliste zu unterscheiden. Die Autoren versuchen, ein wenig Licht in das Dunkel der verschiedenen Typen von Wörterbüchern, Wörterbuchportalen und Wortschatzinformationssystemen zu bringen, und dabei auch zeigen, dass sich die Unordnung, die eine „Schlöraffe“ in die Klassifikation des Tierreichs bringt, am Ende durchaus auszahlen kann.

1 to 8

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

8 search hits