Refine
Year of publication
- 2019 (79) (remove)
Document Type
- Article (27)
- Part of a Book (18)
- Conference Proceeding (18)
- Book (8)
- Other (4)
- Working Paper (3)
- Part of Periodical (1)
Language
- German (47)
- English (31)
- Multiple languages (1)
Is part of the Bibliography
- yes (79) (remove)
Keywords
- Deutsch (31)
- Korpus <Linguistik> (20)
- Automatische Sprachanalyse (10)
- Gesprochene Sprache (10)
- Sprachstatistik (6)
- Social Media (5)
- Annotation (4)
- Grammatik (4)
- Interaktionsanalyse (4)
- Neologismus (4)
Publicationstate
- Veröffentlichungsversion (79) (remove)
Reviewstate
- Peer-Review (41)
- (Verlags)-Lektorat (35)
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (17)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (8)
- Leibniz-Institut für Deutsche Sprache (6)
- Lexical Computing CZ s.r.o. (5)
- Heidelberg University Publishing (4)
- Narr (4)
- Spektrum der Wissenschaft Verlagsgesellschaft (3)
- The Association for Computational Linguistics (3)
- Association for Computational Linguistics (1)
- Buske (1)
Die lexikografische Behandlung von Neologismen aus der Perspektive hispanophoner DaF-Lernender
(2019)
Anhand von einigen medialen Kommunikationsverben wie mailen oder twittern wird das lexikografische Informationsangebot zu Neologismen auf seine Adäquatheit für die fremdsprachige Produktion untersucht. Die Untersuchung erfolgt aus der Perspektive eines spanischsprachigen DaF-Lernenden. Zur Analyse werden sowohl Neologismenwörterbücher und -datenbanken für das Deutsche als auch gängige, bilinguale Online-Wörterbücher für das Sprachenpaar Spanisch–Deutsch gezogen. Die Ergebnisse der lexikografischen Untersuchung werden exemplarisch mit korpusbasierten Daten aus einer Doktorarbeit verglichen. Die Befunde zeigen den Bedarf und die Notwendigkeit auf, die lexikografische Behandlung von (verbalen) Neologismen im spanisch–deutschen Kontext zu optimieren. Dabei soll — insbesondere — die fremdsprachige Textproduktion berücksichtigt werden.
Lektürehinweis
(2019)
Das Handbuch ist eine periodische und mehrsprachige Online-Publikation. Die bisher veröffentlichten Bände wurden bereits über 8.500 Mal heruntergeladen. Für Leserinnen und Leser, die das haptische Leseerlebnis bevorzugen, ist die Publikation zudem im Printformat erhältlich. Zu ausgewählten Konzepten der Sprachkritik werden sukzessive enzyklopädische Artikel veröffentlicht, die ein sprachkritisches Schlüsselkonzept betreffen und die für die europäische Perspektive von kultureller Bedeutung sind. Das Ziel ist demnach, eine Konzeptgeschichte der europäischen Sprachkritik zu präsentieren. Zum einen liefert das Handbuch einen spezifischen Blick auf die jeweiligen Sprachkulturen. Zum anderen werden diese vergleichend in den Blick genommen.
Einleitung
(2019)
Der vorliegende Band „Sprachinstitutionen und Sprachkritik“ weist eine unmittelbare Verbindung zu den ersten drei Bänden unserer Handbuchreihe und der Frage auf, wie sich das viel diskutierte und diskursiv konstituierte Konzept der sprachlichen Normierung und Standardisierung einer Nationalsprache im Vergleich der Sprachkulturen entwickelt hat und wie es sich aktuell wandelt. Diese Gesichtspunkte lassen aufschlussreiche Verbindungen zum ersten Handbuchband „Sprachnormierung und Sprachkritik“ erkennen, aber auch zum zweiten („Standardisierung und Sprachkritik“) und zum dritten Handbuchband („Sprachpurismus und Sprachkritik“).
Im Folgenden soll der Versuch unternommen werden, die bisherigen Forschungen zur Sprachgeschichte des Nationalsozialismus im Lichte ihres Erkenntnispotenzials für gegenwärtige historiografische Fragestellungen zu diskutieren. In einem ersten Schritt wird die Forschungsgeschichte zur Sprache des beziehungsweise im Nationalsozialismus von der frühen Nachkriegszeit bis in die 1980er-Jahre skizziert. Deutlich wird hier vor allem, dass in dieser Phase zwar wichtige Arbeiten entstanden sind, jedoch methodische und theoretische Begrenzungen zahlreiche blinde Flecken bestehen ließen. In einem zweiten Schritt wird dargelegt, mit welchen Erkenntnisinteressen und Instrumentarien die jüngere Forschung sich auf diese blinden Flecken konzentriert und welche Ergebnisse sie zutage gefördert hat. Dabei sollen vor allem diejenigen Aspekte benannt werden, die nach Ansicht des Verfassers bei einer noch zu schreibenden Sprach- und Kommunikationsgeschichte des Nationalsozialismus zu beachten sind. Es handelt sich konkret um eine Ausweitung der Perspektive in Richtung unterschiedlicher Kommunikationssituationen und heterogener Akteurskonstellationen, um Sprachgebräuche unter den Diskursbedingungen des Nationalsozialismus angemessen beschreiben zu können.
Wird aus Sprache Gewalt?
(2019)
Nach dem Mord am Kasseler Regierungspräsidenten Walter Lübcke und weiteren Mordanschlägen in der jüngsten Vergangenheit wurde in Kommentaren und Stellungnahmen immer wieder behauptet, dass hier Sprache in Gewalt umgeschlagen sei. Dies ist einerseits naheliegend vor dem Hintergrund dessen, was wir über die Täter und ihre Äußerungen wissen. Was aber sagt die Wissenschaft dazu? Wie ist aus sprach- und kommunikationswissenschaftlicher Sicht dieser angenommene Zusammenhang zu bewerten?
In this paper, we describe a data processing pipeline used for annotated spoken corpora of Uralic languages created in the INEL (Indigenous Northern Eurasian Languages) project. With this processing pipeline we convert the data into a loss-less standard format (ISO/TEI) for long-term preservation while simultaneously enabling a powerful search in this version of the data. For each corpus, the input we are working with is a set of files in EXMARaLDA XML format, which contain transcriptions, multimedia alignment, morpheme segmentation and other kinds of annotation. The first step of processing is the conversion of the data into a certain subset of TEI following the ISO standard ’Transcription of spoken language’ with the help of an XSL transformation. The primary purpose of this step is to obtain a representation of our data in a standard format, which will ensure its long-term accessibility. The second step is the conversion of the ISO/TEI files to a JSON format used by the “Tsakorpus” search platform. This step allows us to make the corpora available through a web-based search interface. As an addition, the existence of such a converter allows other spoken corpora with ISO/TEI annotation to be made accessible online in the future.
In this paper, we present WebAnno-MM, an extension of the popular web-based annotation tool WebAnno, which is designed for the linguistic annotation of transcribed spoken data with time aligned media files. Several new features have been implemented for our current use case: a novel teaching method based on pair-wise manual annotation of transcribed video data and systematic comparison of agreement between students. To enable the annotation of transcribed spoken language data, apart from technical and data model related challenges, WebAnno-MM offers an additional view to data: a (musical) score view for the inspection of parallel utterances, which is relevant for various methodological research questions regarding the analysis of interactions of spoken content.
Nach dem Mord am Kasseler Regierungspräsidenten Walter Lübcke und weiteren Mordanschlägen in der jüngsten Vergangenheit wurde in Kommentaren und Stellungnahmen immer wieder behauptet, dass hier Sprache in Gewalt umgeschlagen sei. Dies ist einerseits naheliegend vor dem Hintergrund dessen, was wir über die Täter und ihre Äußerungen wissen. Was aber sagt die Wissenschaft dazu?
Für das Deutsche gibt es keine staatliche Institution, abgesehen vom Rat für deutsche Rechtschreibung, die den politischen Auftrag hat, sprachliche Normierung und schließlich die Standardisierung einer Nationalsprache legitimiert zu begleiten. Vor diesem Hintergrund werden in dem Artikel die verschiedenen nichtstaatlichen Sprachinstitutionen im Deutschen dargelegt. Im deutschen Sprachraum waren die Sprachakademien der Nachbarländer stets Vorbild, herausgebildet haben sich aber vor allem Sprachgesellschaften und Sprachvereine, die sich sprachkultivierend einbrachten.
Handbuch europäische Sprachkritik online (HESO). Band 4: Sprachinstitutionen und Sprachkritik
(2019)
Sprachinstitutionen haben in den einzelnen hier behandelten Sprachen in unterschiedlichem Maße die sprachkultivierende, sprachnormierende und auch sprachkritische Auseinandersetzung befördert. Zu den bekanntesten und in gewissem Sinne auch symbolträchtigsten Sprachinstitutionen zählen zweifellos Sprachakademien wie die Accademia della Crusca oder die Académie française. Neben Sprachakademien stehen mitunter auch Sprachgesellschaften für eine nicht nur sprachnormierende, sondern in Teilen auch sprachpuristische Auseinandersetzung.
Modern theoretical linguistics lives by the insight that the meanings of complex expressions derive from the meanings of their parts and the way these are composed. However, the currently dominating theories of the syntax-semantics interface hastily relegate important aspects of meaning which cannot readily be aligned with visible structure to empty projecting heads non-reductively (mainstream Generative Grammar) or to the syntactic construction holistically (Construction Grammar). This book develops an alternative, compositional analysis of the hidden aspectual-temporal, modal and comparative meanings of a range of productive constructions of which pseudorefl exive, excessive and directional complement constructions take center stage. Accordingly, a contradiction-inducing hence semantically problematic part of literally coded meaning is locally ignored and systematically realized „expatriately“ with respect to parts of structure that achieve the indexical anchoring of propositional contents in terms of times, worlds and standards of comparison, thus yielding the observed hidden meanings.
Ein CERN der deutschen Sprache. Überlegungen zu einem Dokumentationszentrum der deutschen Sprache
(2019)
In an earlier publication it was claimed that there is no useful relationship between Swahili-English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years’ worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side.
Looking at gestures as a means for communication, they can serve conversational participants at several levels. As co-speech gestures, they can add information to the verbally expressed content and they can serve to manage turn-taking. In order to look closer at the interplay between these resources in face-to face conversation, we annotated hand gestures, syntactic completion points and the related turn-organisation, and measured the timing of gesture strokes and their lexical/phrasal referent. In a case study on German, we observe the trend that speakers vary less in gesturelexis on- and offsets when keeping the turn after syntactic completions than at speaker changes, backchannel or other locations of a conversation. This indicates that timing properties of non-verbal cues interact with verbal cues to manage turn-taking.
Das Archiv für Gesprochenes Deutsch (AGD, Stift/Schmidt 2014) am Leibniz-Institut für Deutsche Sprache ist ein Forschungsdatenzentrum für Korpora des gesprochenen Deutsch. Gegründet als Deutsches Spracharchiv (DSAv) im Jahre 1932 hat es über Eigenprojekte, Kooperationen und Übernahmen von Daten aus abgeschlossenen Forschungsprojekten einen Bestand von bald 100 Variations-, Interview- und Gesprächskorpora aufgebaut, die u. a. dialektalen Sprachgebrauch, mündliche Kommunikationsformen oder die Sprachverwendung bestimmter Sprechertypen oder zu bestimmten Themen dokumentieren. Heute ist dieser Bestand fast vollständig digitalisiert und wird zu einem großen Teil der wissenschaftlichen Gemeinschaft über die Datenbank für Gesprochenes Deutsch (DGD) im Internet zur Nutzung in Forschung und Lehre angeboten.
Vorwort
(2019)
The Lehnwortportal Deutsch (2012 seqq.) serves as an integrated online information system on German lexical borrowings into other languages, synthesizing an increasing number of lexicographical dictionaries and providing basic cross-resource search options. The paper discusses the far-reaching revision of the system’s conceptual, lexicographical and technological underpinnings currently under way, focussing on their relevance for multilingual loanword lexicography.
A Supervised learning approach for the extraction of opinion sources and targets from German text
(2019)
We present the first systematic supervised learning approach for the extraction of opinion sources and targets on German language data. A wide choice of different features is presented, particularly syntactic features and generalization features. We point out specific differences between opinion sources and targets. Moreover, we explain why implicit sources can be extracted even with fairly generic features. In order to ensure comparability our classifier is trained and tested on the dataset of the STEPS shared task.
We present a descriptive analysis on the two datasets from the shared task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS), the only existing German dataset for opinion role extraction of its size. Our analysis discusses the individual properties of the three components, subjective expressions, sources and targets and their relations towards each other. Our observations should help practitioners and researchers when building a system to extract opinion roles from German data.
We present the second edition of the GermEval Shared Task on the Identification of Offensive Language. This shared task deals with the classification of German tweets from Twitter. Two subtasks were continued from the first edition, namely a coarse-grained binary classification task and a fine-grained multi-class classification task. As a novel subtask, we introduce the classification of offensive tweets as explicit or implicit.
The shared task had 13 participating groups submitting 28 runs for the coarse-grained
task, another 28 runs for the fine-grained task, and 17 runs for the implicit-explicit
task.
We evaluate the results of the systems submitted to the shared task. The shared task homepage can be found at https://projects.fzai.h-da.de/iggsa/
Der Beitrag beschreibt ein mehrfach annotiertes Korpus deutschsprachiger Songtexte als Datenbasis für interdisziplinäre Untersuchungsszenarien. Die Ressource erlaubt empirisch begründete Analysen sprachlicher Phänomene, systemischstruktureller Wechselbeziehungen und Tendenzen in den Texten moderner Popmusik. Vorgestellt werden Design und Annotationen des in thematische und autorenspezifische Archive stratifizierten Korpus sowie deskriptive Statistiken am Beispiel des Udo-Lindenberg-Archivs.
Automatic division of spoken language transcripts into sentence-like units is a challenging problem, caused by disfluencies, ungrammatical structures and the lack of punctuation. We present experiments on dividing up German spoken dialogues where we investigate the impact of task setup and data representation, encoding of context information as well as different model architectures for this task.
This paper presents first steps towards metaphor detection in German poetry, in particular in expressionist poems. We create a dataset with adjective-noun pairs extracted from expressionist poems, manually annotated for metaphoricity. We discuss the annotation process and present models and experiments for metaphor detection where we investigate the impact of context and the domain dependence of the models.
In this paper, we present our work-inprogress to automatically identify free indirect representation (FI), a type of thought representation used in literary texts. With a deep learning approach using contextual string embeddings, we achieve f1 scores between 0.45 and 0.5 (sentence-based evaluation for the FI category) on two very different German corpora, a clear improvement on earlier attempts for this task. We show how consistently marked direct speech can help in this task. In our evaluation, we also consider human inter-annotator scores and thus address measures of certainty for this difficult phenomenon.
This paper presents the prototype of a lexicographic resource for spoken German in interaction, which was conceived within the framework of the LeGeDe-project (LeGeDe=Lexik des gesprochenen Deutsch). First of all, it summarizes the theoretical and methodological approaches that were used for the initial planning of the resource. The headword candidates were selected by analyzing corpus-based data. Therefore, the data of two corpora (written and spoken German) were compared with quantitative methods. The information that was gathered on the selected headword candidates can be assigned to two different sections: meanings and functions in interaction.
Additionally, two studies on the expectations of future users towards the resource were carried out. The results of these two studies were also taken into account in the development of the prototype. Focusing on the presentation of the resource’s content, the paper shows both the different lexicographical information in selected dictionary entries, and the information offered by the provided hyperlinks and external texts. As a conclusion, it summarizes the most important innovative aspects that were specifically developed for the implementation of such a resource.
Since 2013 representatives of several French and German CMC corpus projects have developed three customizations of the TEI-P5 standard for text encoding in order to adapt the encoding schema and models provided by the TEI to the structural peculiarities of CMC discourse. Based on the three schema versions, a 4th version has been created which takes into account the experiences from encoding our corpora and which is specifically designed for the submission of a feature request to the TEI council. On our poster we would present the structure of this schema and its relations (commonalities and differences) to the previous schemas.
Smooth turn-taking in conversation depends in part on speakers being able to communicate their intention to hold or cede the floor. Both prosodic and gestural cues have been shown to be used in this context. We investigate the interplay of pitch movements and hand gestures at locations at which speaker change becomes relevant, comparing their use in German and Swedish. We find that there are some shared functions of prosody and gesture with regard to turn-taking in the two languages, but that these shared functions appear to be mediated by the different phonological demands on pitch in the two languages.
Classical null hypothesis significance tests are not appropriate in corpus linguistics, because the randomness assumption underlying these testing procedures is not fulfilled. Nevertheless, there are numerous scenarios where it would be beneficial to have some kind of test in order to judge the relevance of a result (e.g. a difference between two corpora) by answering the question whether the attribute of interest is pronounced enough to warrant the conclusion that it is substantial and not due to chance. In this paper, I outline such a test.
Most authors agree that modal particles - a dass of function words widely considered characteristic of Modem German - cannot receive prosodic stress, though the reasons for this restriction have not yet been satisfactorily explained. This paper argues that unstressability follows from the general contribution of modal particles to compositional utterance meaning, which requires them to take scope over focus-background structures. Form and function of modal particle meanings are modelled and illustrated for five representative examples - the particles wohl, ja, eigentlich, eben and halt. It is argued that these as well as other particles, whenever they occur under prosodic stress, cannot preserve the meaning nor the syntactic behaviour of modal particles. All instances of stressed particles in German must therefore be categorized in other functional classes.
This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal issues. Concerning interoperability, we present new ways to access DeReKo via KorAP on the API and on the plugin level. In addition we report about advancements in the EuReCo- and ICC-initiatives with the provision of comparable corpora, and about recent problems with license acquisitions and our solution approaches using an indemnification clause and model licenses that include scientific exploitation.
Contents:
1. Johannes Graën, Tannon Kew, Anastassia Shaitarova and Martin Volk, "Modelling Large Parallel Corpora", S. 1-8
2. Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary, "Asynchronous Pipelines for Processing Huge Corpora on Medium to Low Resource Infrastructures", S. 9-16
3. Vladimír Benko, "Deduplication in Large Web Corpora", S. 17-22
4. Mark Davies, "The best of both worlds: Multi-billion word “dynamic” corpora", S. 23-28
5. Adrien Barbaresi, "On the need for domain-focused web corpora", S. 29-32
6. Marc Kupietz, Eliza Margaretha, Nils Diewald, Harald Lüngen and Peter Fankhauser, "What's New in EuReCo? Interoperability, Comparable Corpora, Licensing", S. 33-39
Forms of verbal violence, such as Hate Speech and Cyberbullying, currently are issues with high societal relevance. In the social discourse they are associated with brutalization of social interaction. Against this background it is necessary to integrate the topic into school lessons. This chapter outlines a teaching unit on verbal violence in the digital age. It has been developed together with students and can therefore be used in German classes but it is also relevant for use in academic teaching.