Refine
Document Type
- Part of a Book (4)
- Conference Proceeding (3)
- Article (1)
Has Fulltext
- yes (8)
Keywords
- Chatten <Kommunikation> (8) (remove)
Publicationstate
Reviewstate
- Peer-Review (4)
- (Verlags)-Lektorat (3)
- Peer-review (1)
Publisher
The aim of this paper is to present the results of an empirical analysis of the use of non-alphabetic graphic signs (e.g. asterisks, slashes, plus signs etc.) in the context of repairs in Russian and German informal electronic communication. The data for the analysis were taken from the “Mobile Communication Database MoCoDa” (http://mocoda.spracheinteraktion.de/), which contains Russian and German private electronic communication via SMS, WhatsApp and other short message services, and the “Dortmunder Chat-Korpus” (http://www.chatkorpus.tu-dortmund.de/korpora.html). This paper describes the functions of various graphic resources in the context of repairs in both data collections and compares the occurrences of these functions in current Russian and German computer-mediated communication. It concludes that particular signs in both data sets share the same subset of functions, but they differ in terms of how frequently these resources occur in each form of communication.
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before its republication. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality assessment are discussed. Finally, pseudonymisation as an alternative to categorisation as a method of the anonymisation of CMC data is discussed, as well as possibilities of an automatisation of the process.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.