Refine
Year of publication
- 2017 (12) (remove)
Document Type
- Conference Proceeding (6)
- Article (2)
- Part of a Book (2)
- Other (1)
- Part of Periodical (1)
Has Fulltext
- yes (12)
Keywords
- Korpus <Linguistik> (6)
- Deutsch (4)
- Annotation (3)
- Historische Phonetik (3)
- Kempelen, Wolfgang von (3)
- Anonymisierung (2)
- Computerunterstützte Kommunikation (2)
- Gesprochene Sprache (2)
- Wörterbuch (2)
- corpora (2)
Publicationstate
- Veröffentlichungsversion (7)
- Postprint (3)
- Zweitveröffentlichung (1)
Reviewstate
- Peer-review (12) (remove)
Publisher
This paper presents a short insight into a new project at the "Institute for the German Language” (IDS) (Mannheim). It gives an insight into some basic ideas for a corpus-based dictionary of spoken German, which will be developed and compiled by the new project "The Lexicon of spoken German” (Lexik des gesprochenen Deutsch, LeGeDe). The work is based on the "Research and Teaching Corpus of Spoken German” (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK), which is implemented in the "Database for Spoken German” (Datenbank für Gesprochenes Deutsch, DGD). Both resources, the database and the corpus, have been developed at the IDS.
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before its republication. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality assessment are discussed. Finally, pseudonymisation as an alternative to categorisation as a method of the anonymisation of CMC data is discussed, as well as possibilities of an automatisation of the process.
Catching the common cause: extraction and annotation of causal relations and their participants
(2017)
In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.
The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
Wolfgang von Kempelen's book "The Mechanism of Human Speech" from 1791 is a famous milestone in the history of speech communication research. It has an enormous relevance for the phonetic sciences and it marks an important turning point for the development of the (mechanical) speech synthesis. So far no English version of this work was available, which excludes many interested researchers. Access to the original versions in German and French is restricted for various reasons. For example the blackletter script of the German version is troublesome for most of today's readers. We report here on a new edition of Kempelen's book which unites a better readable German version and its English translation. It will now also be in a searchable electronic format and has been enriched with many commentaries, which aid in the understanding of details of the late 18th century that are little known or unknown to many researchers today.
Interview mit Ulrich Engel
(2017)
There are a number of recent replicas of Wolfgang von Kempelen's speaking machine. Although all of them are explicitly based on Kempelen's own description nearly none of them are identical in construction and sound. In this paper we want to illustrate some of these differences and their reasons for five replicas built by ourselves.
Das Handbuch Europäische Sprachkritik Online liefert eine vergleichende Perspektive auf Sprachkritik in europäischen Sprachkulturen (im Speziellen auf die Sprachkritik im Deutschen, Englischen, Französischen, Italienischen und Kroatischen). In dem Handbuch werden zentrale Konzepte der Sprachkritik deskriptiv behandelt. Das Ziel ist demnach, eine Konzeptgeschichte der europäischen Sprachkritik zu präsentieren. Zum einen liefert das Handbuch einen spezifischen Blick auf die jeweiligen Sprachkulturen. Zum anderen werden diese vergleichend in den Blick genommen. Das multilinguale Handbuch erscheint periodisch in Bänden.