Refine
Document Type
- Part of a Book (10)
- Article (9)
- Conference Proceeding (6)
- Part of Periodical (1)
Has Fulltext
- yes (26)
Is part of the Bibliography
- yes (26) (remove)
Keywords
- Deutsch (13)
- Korpus <Linguistik> (7)
- Rumänisch (5)
- Annotation (3)
- Historische Phonetik (3)
- Kempelen, Wolfgang von (3)
- Anonymisierung (2)
- Argumentstruktur (2)
- Computerlinguistik (2)
- Computerunterstützte Kommunikation (2)
Publicationstate
- Veröffentlichungsversion (18)
- Postprint (4)
Reviewstate
- Peer-review (26) (remove)
Publisher
- De Gruyter (7)
- TUDpress (3)
- de Gruyter (2)
- EACL (1)
- Elsevier (1)
- European Network of e-Lexicography (ENeL) (1)
- Heidelberg University Publishing (1)
- Karolinum Verlag (1)
- Linköping University (1)
- Springer (1)
This paper presents a short insight into a new project at the "Institute for the German Language” (IDS) (Mannheim). It gives an insight into some basic ideas for a corpus-based dictionary of spoken German, which will be developed and compiled by the new project "The Lexicon of spoken German” (Lexik des gesprochenen Deutsch, LeGeDe). The work is based on the "Research and Teaching Corpus of Spoken German” (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK), which is implemented in the "Database for Spoken German” (Datenbank für Gesprochenes Deutsch, DGD). Both resources, the database and the corpus, have been developed at the IDS.
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before its republication. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality assessment are discussed. Finally, pseudonymisation as an alternative to categorisation as a method of the anonymisation of CMC data is discussed, as well as possibilities of an automatisation of the process.
Catching the common cause: extraction and annotation of causal relations and their participants
(2017)
In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.
The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
Question Answering Systems for retrieving information from Knowledge Graphs (KG) have become a major area of interest in recent years. Current systems search for words and entities but cannot search for grammatical phenomena. The purpose of this paper is to present our research on developing a QA System that answers natural language questions about German grammar.
Our goal is to build a KG which contains facts and rules about German grammar, and is also able to answer specific questions about a concrete grammatical issue. An overview of the current research in the topic of QA systems and ontology design is given and we show how we plan to construct the KG by integrating the data in the grammatical information system Grammis, hosted by the Leibniz-Institut für Deutsche Sprache (IDS). In this paper, we describe the construction of the initial KG, sketch our resulting graph, and demonstrate the effectiveness of such an approach. A grammar correction component will be part of a later stage. The paper concludes with the potential areas for future research.
This contribution offers a fine-grained analysis of German and Romanian ditransitive and prepositional transfer constructions. The transfer construction (TC) is shown to be realised in German by 26 argument structure patterns (ASPs), which are conceived of as form-meaning pairings which differ only minimally. The mainstream constructionist view of the different types of TCs being related by polysemy links is rejected, the ASPs being argued instead to be related by family relationships. All but six of the ASPs identified for German are shown to possess a Romanian counterpart. For some ditransitive structures, German is shown to possess two prepositional variants, one with an (‘at’) and one with zu (‘to’) or auf (‘on’), while Romanian has only one. Due to the lack of a Romanian counterpart for the German zu and auf variants, Romanian lacks some of the dative alternations found in German. However, Romanian as well as German permits the double object pattern to interact with take-verbs, verbs of removal and add-verbs, which do not allow the ditransitive construction in English. Since these verb classes also permit at least one prepositional pattern in both languages, Romanian and German show a larger number of dative alternation types than English.
Wolfgang von Kempelen's book "The Mechanism of Human Speech" from 1791 is a famous milestone in the history of speech communication research. It has an enormous relevance for the phonetic sciences and it marks an important turning point for the development of the (mechanical) speech synthesis. So far no English version of this work was available, which excludes many interested researchers. Access to the original versions in German and French is restricted for various reasons. For example the blackletter script of the German version is troublesome for most of today's readers. We report here on a new edition of Kempelen's book which unites a better readable German version and its English translation. It will now also be in a searchable electronic format and has been enriched with many commentaries, which aid in the understanding of details of the late 18th century that are little known or unknown to many researchers today.
Gegenstand des Aufsatzes sind Sätze mit so genannten inneren Objekten, das sind Akkusativobjekte, die im Wesentlichen intransitive Verben gelegentlich zu sich nehmen. Sie weisen die Besonderheit auf, dass das Objektsnomen und das Verb morphologisch, etymologisch und/oder semantisch miteinander verwandt sind. Aufgrund von Form- und vor allem Bedeutungsunterschieden lassen sich in beiden Sprachen verschiedene Gruppen von inneren Objekten ausmachen, die genauer beschrieben und unter sprachvergleichenden Gesichtspunkten betrachtet werden. Dazu werden u.a. die syntaktischen Eigenschaften von Sätzen mit inneren Objekten herangezogen. Einige auffallende sprachbezogene Unterschiede werden beschrieben, beispielsweise ist im Rumänischen bei einigen Verben ein präpositionaler Anschluss möglich, wo im Deutschen das innere Objekt ausschließlich im Akkusativ stehen kann. Sätze mit inneren Objekten können als ein Typ von Argumentstrukturmustern betrachtet werden. In diesem Sinne sind sie Form-Bedeutungs-Paare, deren Beziehungen untereinander innerhalb eines Konzepts von Familienähnlichkeiten dargestellt werden, wie man sie auch innerhalb anderer Cluster von Argumentstrukturmustern beobachten kann.
There are a number of recent replicas of Wolfgang von Kempelen's speaking machine. Although all of them are explicitly based on Kempelen's own description nearly none of them are identical in construction and sound. In this paper we want to illustrate some of these differences and their reasons for five replicas built by ourselves.
In diesem Beitrag liegt der Fokus auf der Vorfeldbesetzung des deutschen Satzes, insofern das Vorfeld einerseits aus einem Satzglied oder mehreren Satzgliedern und einem infiniten Teil des Verbalkomplexes oder andererseits nur aus dem infiniten Teil des Verbalkomplexes besteht. Bei diesen Formen der Vorfeldbesetzung werden Varianten und deren informationsstrukturelle Besonderheiten betrachtet. Des Weiteren soll der Frage nachgegangen werden, ob – entgegen einer haufig vorgebrachten Regel, dass das Vorfeld des deutschen Satzes nur einfach besetzt werden kann – eindeutige und auch akzeptable Belege in den Wikipedia-Korpora auffindbar sind, die darauf hinweisen, dass im Deutschen durchaus eine Vorfeldbesetzung mit mehr als einem Satzglied auftreten kann.
Some structures in printed dictionaries also occur in online dictionaries, some do not occur, some need to be adapted whereas new structures may be introduced in online dictionaries. This paper looks at one type of structure, known in printed dictionaries as outer texts. It is argued that the notions of a frame structure and front and back matter texts do not apply to online dictionaries. The data distribution in online dictionaries does not only target the dictionary articles. There are components outside the word list section of the dictionary. These components are not always texts. They could e.g. also be video clips. Consequently the notion of outer texts in printed dictionaries is substituted by the notion of outer features in online dictionaries. This paper shows how outer features help to constitute a feature compound. The outer features in eight online dictionaries are discussed. Where the users guidelines text is a compulsory outer text in printed dictionaries it seems that an equivalent feature is often eschewed in online dictionaries. A distinction is made between dictionary-internal and dictionary-external outer features, illustrating that outer features can be situated in other sources than the specific dictionary. More research is needed to formulate models for online features that can play a comprehensive role in online dictionaries.
Scripted reality shows oscillate between fiction and nonfiction because based on a script they use amateur actors but also adopt the aesthetics of documentary-style reality television. Perception studies have proven that many viewers mistake the contents of such programs as everyday reality. An adequate framework to reveal these ambiguous relations needs to combine the product analysis with the additional analysis of the production aspects. That includes developing a categorization for (scripted) reality television, a combined analysis of the product and its production, and an analysis of the perception of scripted reality on television and through corresponding social media sites.
Das Handbuch Europäische Sprachkritik Online liefert eine vergleichende Perspektive auf Sprachkritik in europäischen Sprachkulturen (im Speziellen auf die Sprachkritik im Deutschen, Englischen, Französischen, Italienischen und Kroatischen). In dem Handbuch werden zentrale Konzepte der Sprachkritik deskriptiv behandelt. Das Ziel ist demnach, eine Konzeptgeschichte der europäischen Sprachkritik zu präsentieren. Zum einen liefert das Handbuch einen spezifischen Blick auf die jeweiligen Sprachkulturen. Zum anderen werden diese vergleichend in den Blick genommen. Das multilinguale Handbuch erscheint periodisch in Bänden.