Refine
Document Type
- Part of a Book (10)
- Article (9)
- Conference Proceeding (6)
- Part of Periodical (1)
Has Fulltext
- yes (26)
Is part of the Bibliography
- yes (26) (remove)
Keywords
- Deutsch (13)
- Korpus <Linguistik> (7)
- Rumänisch (5)
- Annotation (3)
- Historische Phonetik (3)
- Kempelen, Wolfgang von (3)
- Anonymisierung (2)
- Argumentstruktur (2)
- Computerlinguistik (2)
- Computerunterstützte Kommunikation (2)
Publicationstate
- Veröffentlichungsversion (18)
- Postprint (4)
Reviewstate
- Peer-review (26) (remove)
Publisher
- De Gruyter (7)
- TUDpress (3)
- de Gruyter (2)
- EACL (1)
- Elsevier (1)
- European Network of e-Lexicography (ENeL) (1)
- Heidelberg University Publishing (1)
- Karolinum Verlag (1)
- Linköping University (1)
- Springer (1)
Question Answering Systems for retrieving information from Knowledge Graphs (KG) have become a major area of interest in recent years. Current systems search for words and entities but cannot search for grammatical phenomena. The purpose of this paper is to present our research on developing a QA System that answers natural language questions about German grammar.
Our goal is to build a KG which contains facts and rules about German grammar, and is also able to answer specific questions about a concrete grammatical issue. An overview of the current research in the topic of QA systems and ontology design is given and we show how we plan to construct the KG by integrating the data in the grammatical information system Grammis, hosted by the Leibniz-Institut für Deutsche Sprache (IDS). In this paper, we describe the construction of the initial KG, sketch our resulting graph, and demonstrate the effectiveness of such an approach. A grammar correction component will be part of a later stage. The paper concludes with the potential areas for future research.
Das Handbuch Europäische Sprachkritik Online liefert eine vergleichende Perspektive auf Sprachkritik in europäischen Sprachkulturen (im Speziellen auf die Sprachkritik im Deutschen, Englischen, Französischen, Italienischen und Kroatischen). In dem Handbuch werden zentrale Konzepte der Sprachkritik deskriptiv behandelt. Das Ziel ist demnach, eine Konzeptgeschichte der europäischen Sprachkritik zu präsentieren. Zum einen liefert das Handbuch einen spezifischen Blick auf die jeweiligen Sprachkulturen. Zum anderen werden diese vergleichend in den Blick genommen. Das multilinguale Handbuch erscheint periodisch in Bänden.
Wolfgang von Kempelen's book "The Mechanism of Human Speech" from 1791 is a famous milestone in the history of speech communication research. It has an enormous relevance for the phonetic sciences and it marks an important turning point for the development of the (mechanical) speech synthesis. So far no English version of this work was available, which excludes many interested researchers. Access to the original versions in German and French is restricted for various reasons. For example the blackletter script of the German version is troublesome for most of today's readers. We report here on a new edition of Kempelen's book which unites a better readable German version and its English translation. It will now also be in a searchable electronic format and has been enriched with many commentaries, which aid in the understanding of details of the late 18th century that are little known or unknown to many researchers today.
There are a number of recent replicas of Wolfgang von Kempelen's speaking machine. Although all of them are explicitly based on Kempelen's own description nearly none of them are identical in construction and sound. In this paper we want to illustrate some of these differences and their reasons for five replicas built by ourselves.
Das 18. Jahrhundert war wissenschaftlich von großen Umbrüchen geprägt, auch im Bereich der Anatomie und Physiologie des Menschen. Die hierauserwachsende lebhafte Diskussion erstreckte sich auch auf das noch sehr junge Gebiet der (mechanischen) Sprachsynthese und ihrer Grundlagen. Das Sprachsynthesekonzept Wolfgang von Kempelens (1734–1804) ist hierbei ein besonders eindrückliches Beispiel dafür, dass eine grundlegende wissenschaftliche Erkenntnis womöglich durch technologische Limitationen nicht notwendigerweise auch praktisch umgesetzt werden kann. Grundsätzlich waren Kempelens Erkenntnisse zur Anatomie und Physiologie des Menschen und damit auch zur Spracherzeugung weitestgehend zutreffend. Die praktische Umsetzung hingegen wirkt aus heutiger Sicht recht kurios. Kempelens Vokaltrakt-Konzept soll exemplarisch dem nur wenig früher entstandenen Prototypen zur Sprachsynthese Christian Gottlieb Kratzensteins (1723–1795) gegenübergestellt werden. Dessen „Erkenntnisse“ müssen heute vielfach als falsch bezeichnet werden; sein Modell zur Vokalsynthese weist einerseits auffällige Parallelen zu demjenigen KEMPELENS auf, geht hinsichtlich der Physiologie jedoch von vielfach irrigen Annahmen aus.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before its republication. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality assessment are discussed. Finally, pseudonymisation as an alternative to categorisation as a method of the anonymisation of CMC data is discussed, as well as possibilities of an automatisation of the process.
The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.
Lexicographic meaning descriptions of German lexical items which are formally and semantically similar and therefore easily confused (so-called paronyms) often do not reflect their current usage of lexical items. They can even contradict one’s personal intuition or disagree with lexical usage as observed in public discourse. The reasons are manifold. Language data used for compiling dictionaries is either outdated, or lexicographic practice is rather conventional and does not take advantage of corpus-assisted approaches to semantic analysis. Despite of various modern electronic or online reference works speakers face uncertainties when dealing with easily confusable words. These are for example sensibel/sensitiv (sensitive) or kindisch/kindlich (childish/childlike). Existing dictionaries often do not provide satisfactory answers as to how to use these sets correctly. Numerous questions addressed in online forums show where uncertainties with paronyms are and why users demand further assistance concerning proper contextual usage (cf. Storjohann 2015). There are different reasons why users misuse certain items or mix up words which are similar in form and meaning. As data from written and more spontaneous language resources suggest, some confusions arise due to ongoing semantic change in the current use of some paronyms. This paper identifies shortcomings of contemporary German Dictionaries and discusses innovative ways of empirical lexicographic work that might pave the way for a new data-driven, descriptive reference work of confusable German terms. Currently, such a guide is being developed at the Institute for German Language in Mannheim implementing corpora and diverse corpus-analytical methods. Its objective is to compile a dictionary with contrastive entries which is a useful reference tool in situation of language doubt. At the same time, it aims at sensitizing users of context dependency and language change.