Refine
Year of publication
- 2020 (81) (remove)
Document Type
- Part of a Book (46)
- Article (25)
- Conference Proceeding (8)
- Book (1)
- Other (1)
Is part of the Bibliography
- yes (81) (remove)
Keywords
- Deutsch (30)
- Korpus <Linguistik> (19)
- Gesprochene Sprache (10)
- Wörterbuch (7)
- Forschungsdaten (5)
- Grammatik (5)
- Sprachgebrauch (5)
- Wortschatz (5)
- Annotation (4)
- Argumentstruktur (4)
Publicationstate
- Zweitveröffentlichung (81) (remove)
Reviewstate
- (Verlags)-Lektorat (44)
- Peer-Review (36)
Publisher
We present a new resource for German causal language, with annotations in context for verbs, nouns and adpositions. Our dataset includes 4,390 annotated instances for more than 150 different triggers. The annotation scheme distinguishes three different types of causal events (CONSEQUENCE, MOTIVATION, PURPOSE). We also provide annotations for semantic roles, i.e. of the cause and effect for the causal event as well as the actor and affected party, if present. In the paper, we present inter-annotator agreement scores for our dataset and discuss problems for annotating causal language. Finally, we present experiments where we frame causal annotation as a sequence labelling problem and report baseline results for the prediciton of causal arguments and for predicting different types of causation.
In diesem Beitrag werden neue, repräsentative Daten zur arealen Variation in Deutschland vorgestellt, die das Leibniz-Institut für Deutsche Sprache im Rahmen der Innovationsstichprobe des Sozio-ökonomischen Panels (SOEP) des Deutschen Instituts für Wirtschaftsforschung (DIW) in der Befragungsrunde 2017/2018 erhoben hat. Zum einen wurde die Dialektkompetenz abgefragt; überindividuell zeigt sich hier das bekannte Nord-Süd-Gefälle, beim individuellen Grad der Kompetenz der Dialektsprecher gibt es aber regional nur geringe Unterschiede. Zum anderen wurden die Bewertungen von Dialekten erhoben; hier werden Norddeutsch und Bayerisch besonders positiv, Sächsisch hingegen besonders negativ bewertet, wobei regionale Muster eine Rolle spielen. Auffällig ist ferner die bundesweit sehr einheitlich positive Bewertung des Hochdeutschen.
Bericht vom ersten nationalen Best-Practice-Workshop der deutschen Open-Access-Monografienfonds
(2020)
Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society
(2020)
This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of word formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DeReKo and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
The present paper outlines the projected second part of the Corpus Query Lingua Franca (CQLF) family of standards: CQLF Ontology, which is currently in the process of standardization at the International Standards Organization (ISO), in its Technical Committee 37, Subcommittee 4 (TC37SC4) and its national mirrors. The first part of the family, ISO 24623-1 (henceforth CQLF Metamodel), was successfully adopted as an international standard at the beginning of 2018. The present paper reflects the state of the CQLF Ontology at the moment of submission for the Committee Draft ballot. We provide a brief overview of the CQLF Metamodel, present the assumptions and aims of the CQLF Ontology, its basic structure, and its potential extended applications. The full ontology is expected to emerge from a community process, starting from an initial version created by the authors of the present paper.
Corpus REDEWIEDERGABE
(2020)
This article presents the corpus REDEWIEDERGABE, a German-language historical corpus with detailed annotations for speech, thought and writing representation (ST&WR). With approximately 490,000 tokens, it is the largest resource of its kind. It can be used to answer literary and linguistic research questions and serve as training material for machine learning. This paper describes the composition of the corpus and the annotation structure, discusses some methodological decisions and gives basic statistics about the forms of ST&WR found in this corpus.
Dieser Beitrag widmet sich der Beschreibung des Korpus Deutsch in Namibia (DNam), das über die Datenbank für Gesprochenes Deutsch (DGD) frei zugänglich ist. Bei diesem Korpus handelt es sich um eine neue digitale Ressource, die den Sprachgebrauch der deutschsprachigen Minderheit in Namibia sowie die zugehörigen Spracheinstellungen umfassend und systematisch dokumentiert. Wir beschreiben die Datenerhebung und die dabei angewandten Methoden (freie Gespräche, „Sprachsituationen“, semi-strukturierte Interviews), die Datenaufbereitung inklusive Transkription, Normalisierung und Tagging sowie die Eigenschaften des verfügbaren Korpus (Umfang, verfügbare Metadaten usw.) und einige grundlegende Funktionalitäten im Rahmen der DGD. Erste Forschungsergebnisse, die mithilfe der neuen Ressource erzielt wurden, veranschaulichen die vielseitige Nutzbarkeit des Korpus für Fragestellungen aus den Bereichen Kontakt-, Variations-
und Soziolinguistik.
Der Weihnachtsbrief
(2020)
Die diesjährige Jahrestagung des Leibniz-Instituts für Deutsche Sprache in Mannheim mit dem Titel „Deutsch in Europa“ zielte auf eine Perspektivenerweiterung ab. In zwölf Fachvorträgen, neun Projektvorstellungen im Rahmen einer Methodenmesse und einer Podiumsdiskussion wurden sprachpolitische, grammatische und methodische Aspekte des sprachlichen Nebeneinanders in Europa, des Sprachvergleichs und des Deutscherwerbs diskutiert.
Im alltäglichen Leben sind Sozialen Medien kaum mehr entbehrlich: ob zum Zweck der Kommunikation, wie auf WhatsApp, zum Teilen von Inhalten und Fotos, z.B. durch Facebook und Instagram, oder zur Teilhabe am Weltgeschehen über Twitter. Der Band untersucht, ob und wie Soziale Medien unsere Kommunikation und auch unsere Sprache verändern und welche neuartigen kommunikativen Formen der Gebrauch Sozialer Medien hervorgebracht hat.
Entity framing is the selection of aspects of an entity to promote a particular viewpoint towards that entity. We investigate entity framing of political figures through the use of names and titles in German online discourse, enhancing current research in entity framing through titling and naming that concentrates on English only. We collect tweets that mention prominent German politicians and annotate them for stance. We find that the formality of naming in these tweets correlates positively with their stance. This confirms sociolinguistic observations that naming and titling can have a status-indicating function and suggests that this function is dominant in German tweets mentioning political figures. We also find that this status-indicating function is much weaker in tweets from users that are politically left-leaning than in tweets by right leaning users. This is in line with observations from moral psychology that left-leaning and right-leaning users assign different importance to maintaining social hierarchies.
Older adults are often exposed to elderspeak, a specialized speech register linked with negative outcomes. However, previous research has mainly been conducted in nursing homes without considering multiple contextual conditions. Based on a novel contextually-driven framework, we examined elderspeak in an acute general versus geriatric German hospital setting. Individuallevel information such as cognitive impairment (CI) and audio-recorded data from care interactions between 105 older patients (M = 83.2 years; 49% with severe CI) and 34 registered nurses (M = 38.9 years) were assessed. Psycholinguistic analyses were based on manual coding (k = .85 to k = .97) and computer-assisted procedures. First, diminutives (61%), collective pronouns (70%), and tag questions (97%) were detected. Second, patients’ functional impairment emerged as an important factor for elderspeak. Our study suggests that functional impairment may be a more salient trigger of stereotype activation than CI and that elderspeak deserves more attention in acute hospital settings.
The sentiment polarity of an expression (whether it is perceived as positive, negative or neutral) can be influenced by a number of phenomena, foremost among them negation. Apart from closed-class negation words like no, not or without, negation can also be caused by so-called polarity shifters. These are content words, such as verbs, nouns or adjectives, that shift polarities in their opposite direction, e. g. abandoned in “abandoned hope” or alleviate in “alleviate pain”. Many polarity shifters can affect both positive and negative polar expressions, shifting them towards the opposing polarity. However, other shifters are restricted to a single shifting direction. Recoup shifts negative to positive in “recoup your losses”, but does not affect the positive polarity of fortune in “recoup a fortune”. Existing polarity shifter lexica only specify whether a word can, in general, cause shifting, but they do not specify when this is limited to one shifting direction. To address this issue we introduce a supervised classifier that determines the shifting direction of shifters. This classifier uses both resource-driven features, such as WordNet relations, and data-driven features like in-context polarity conflicts. Using this classifier we enhance the largest available polarity shifter lexicon.
We present a fine-grained NER annotations scheme with 30 labels and apply it to German data. Building on the OntoNotes 5.0 NER inventory, our scheme is adapted for a corpus of transcripts of biographic interviews by adding categories for AGE and LAN(guage) and also adding label classes for various numeric and temporal expressions. Applying the scheme to the spoken data as well as a collection of teaser tweets from newspaper sites, we can confirm its generality for both domains, also achieving good inter-annotator agreement. We also show empirically how our inventory relates to the well-established 4-category NER inventory by re-annotating a subset of the GermEval 2014 NER coarse-grained dataset with our fine label inventory. Finally, we use a BERT-based system to establish some baselines for NER tagging on our two new datasets. Global results in in-domain testing are quite high on the two datasets, near what was achieved for the coarse inventory on the CoNLLL2003 data. Cross-domain testing produces much lower results due to the severe domain differences.
Fragen sind zentrale Interventionen im Coaching. Trotzdem gibt es kaum Erkenntnisse darüber, wie sie zur Veränderung bei Klientinnen und Klienten beitragen. Mit ihrem Fokus auf die sequenzielle Abfolge von Äußerungen wie „Frage – Antwort – Reaktion“ kann die linguistische Gesprächsanalyse dieses Veränderungspotenzial von Fragen beschreiben und so auch für die (Weiterbildungs-)Praxis oder Personalwirtschaft zugänglich machen.
This is an introduction to a special issue of Dictionaries: Journal of the Dictionary Society of North America. It offers a characterization of neology and describes the Globalex-sponsored workshop at which the papers in the issue originated. It provides an overview of the papers, which treat lexicographical neology and neological lexicography in Danish, Dutch, Estonian, Frisian, Greek, Korean, Spanish, and Swahili and address relevant aspects of lexicography in those languages, presenting state-of-the-art research into neology and ideas about modern lexicographic treatment of neologisms in various dictionary types.
How Do Speakers Define the Meaning of Expressions? The Case of German x heißt y (“x means y”)
(2020)
To secure mutual understanding in interaction, speakers sometimes explain or negotiate expressions. Adopting a conversation analytic and interaction linguistic approach, I examine how participants explain which kinds of expressions in different sequential environments, using the format x heißt y (“x means y”). When speakers use it to clarify technical terms or foreign words that are unfamiliar to co-participants, they often provide a situationally anchored definition that however is rather context-free and therefore transferable to future situations. When they explain common (but indexical, ambiguous, polysemous, or problematic) expressions instead, speakers always design their explanation strongly connected to the local context, building on situational circumstances. I argue that x heißt y definitions in interaction do not meet the requirements of scientific or philosophical definitions but that this is irrelevant for the situational exigencies speakers face.
Der Beitrag untersucht vorhandene Lösungen und neue Möglichkeiten des Korpusausbaus aus Social Media- und internetbasierter Kommunikation (IBK) für das Deutsche Referenzkorpus (DEREKO). DEREKO ist eine Sammlung gegenwartssprachlicher Schriftkorpora am IDS, die der sprachwissenschaftlichen Öffentlichkeit über die Korpusschnittstellen COSMAS II und KorAP angeboten wird. Anhand von Definitionen und Beispielen gehen wir zunächst auf die Extensionen und Überlappungen der Konzepte Social Media, Internetbasierte Kommunikation und Computer-mediated Communication ein. Wir betrachten die rechtlichen Voraussetzungen für einen Korpusausbau aus Sozialen Medien, die sich aus dem kürzlich in relevanten Punkten reformierten deutschen Urheberrecht, aus Persönlichkeitsrechten wie der europäischen Datenschutz-Grundverordnung ergeben und stellen Konsequenzen sowie mögliche und tatsächliche Umsetzungen dar. Der Aufbau von Social Media-Korpora in großen Textmengen unterliegt außerdem korpustechnologischen Herausforderungen, die für traditionelle Schriftkorpora als gelöst galten oder gar nicht erst bestanden. Wir berichten, wie Fragen der Datenaufbereitung, des Korpus-Encoding, der Anonymisierung oder der linguistischen Annotation von Social Media Korpora für DEREKO angegangen wurden und welche Herausforderungen noch bestehen. Wir betrachten die Korpuslandschaft verfügbarer deutschsprachiger IBK- und Social Media-Korpora und geben einen Überblick über den Bestand an IBK- und Social Media-Korpora und ihre Charakteristika (Chat-, Wiki Talk- und Forenkorpora) in DEREKO sowie von laufenden Projekten in diesem Bereich. Anhand korpuslinguistischer Mikro- und Makro-Analysen von Wikipedia-Diskussionen im Vergleich mit dem Gesamtbestand von DEREKO zeigen wir charakterisierende sprachliche Eigenschaften von Wikipedia-Diskussionen auf und bewerten ihren Status als Repräsentant von IBK-Korpora.
This paper presents experiments on sentence boundary detection in transcripts of spoken dialogues. Segmenting spoken language into sentence-like units is a challenging task, due to disfluencies, ungrammatical or fragmented structures and the lack of punctuation. In addition, one of the main bottlenecks for many NLP applications for spoken language is the small size of the training data, as the transcription and annotation of spoken language is by far more time-consuming and labour-intensive than processing written language. We therefore investigate the benefits of data expansion and transfer learning and test different ML architectures for this task. Our results show that data expansion is not straightforward and even data from the same domain does not always improve results. They also highlight the importance of modelling, i.e. of finding the best architecture and data representation for the task at hand. For the detection of boundaries in spoken language transcripts, we achieve a substantial improvement when framing the boundary detection problem as a sentence pair classification task, as compared to a sequence tagging approach.
Interaktionale Semantik
(2020)
Interaktive Emergenz und Stabilisierung. Zur Entstehung kollektiver Kreativität in Theaterproben
(2020)
Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARIN
(2020)
CLARIN is a European Research Infrastructure providing access to language resources and technologies for researchers in the humanities and social sciences. It supports the use and study of language data in general and aims to increase the potential for comparative research of cultural and societal phenomena across the boundaries of languages and disciplines, all in line with the European agenda for Open Science. Data infrastructures such as CLARIN have recently embarked on the emerging frameworks for the federation of infrastructural services, such as the European Open Science Cloud and the integration of services resulting from multidisciplinary collaboration in federated services for the wider domain of the social sciences and humanities (SSH). In this paper we describe the interoperability requirements that arise through the existing ambitions and the emerging frameworks. The interoperability theme will be addressed at several levels, including organisation and ecosystem, design of workflow services, data curation, performance measurement and collaboration. For each level, some concrete outcomes are described.
Jesus in der Alltagssprache
(2020)
Am Leibniz-Institut für Deutsche Sprache (IDS) wurde im Programmbereich „Lexikografie und Sprachdokumentation“ ein neuartiges Wörterbuch entwickelt, das leicht verwechselbare Ausdrücke in ihrem aktuellen öffentlichen Sprachgebrauch deskriptiv beschreibt. Im Jahr 2018 erschien das elektronische Nachschlagewerk „Paronyme – Dynamisch im Kontrast“, das sich durch folgende drei Aspekte auszeichnet:
1) Erstens liegen mehrstufige kontrastive Beschreibungsebenen und flexible Darstellungsformen vor;
2) zweitens sind die Bedeutungserläuterungen kognitiv-konzeptuell angelegt, um einer langen Forderung nach einer stärker kognitiv ausgerichteten Lexikografie Rechnung zu tragen;
3) drittens werden Datengrundlagen und Analysemethoden genutzt, mit denen umfassend Paronyme ermittelt und diese anschließend erstmals empirisch ausgewertet werden konnten.
Gegenstand dieses Beitrags ist die Entwicklung des graphentheoretischen Analysetools Laniakea, das zur Visualisierung von Phänomenen und Veränderungen in terminologischen Netzwerken entwickelt wurde. Wir führen theoretische Grundlagen, Designentscheidungen und technische Details der Implementierung des Tools aus. Darüber hinaus wird auch eine Beschreibung von Erfahrungen im Fokus des Beitrages stehen, die bei der Anwendung von Laniakea bei der Überarbeitung der terminologischen Ressourcen des Grammatischen Informationssystems grammis, gesammelt wurden.
Le bilinguisme en Moselle-Est. Un projet de documentation linguistique de la situation actuelle.
(2020)
Qui parle aujourd'hui quelle langue avec qui et à quelle occasion? Quelles idées les habitants de la Moselle germanophone associent-ils aux dialectes et aux langues? Comment le Platt lorrain est-il transmis? à quoi cela ressemble-t-il dans les différents coins de la Moselle ? Pour répondre à ces questions, le Leibniz- Institut für Deutsche Sprache (IDS) a lancé un projet de documentation sonore pour la recherche linguistique.
Lean syntax: how argument structure is adapted to its interactive, material, and temporal ecology
(2020)
It has often been argued that argument structure in spoken discourse is less complex than in written discourse. This paper argues that lean argument structure, in particular, argument omission, gives evidence of how the production and understanding of linguistic structures is adapted to the interactive, material, and temporal ecology of talk-in-interaction. It is shown how lean argument structure builds on participants' ongoing bodily conduct, joint perceptual salience, joint attention, and their Orientation to expectable next actions within a joint project. The phenomena discusscd in this paper are verb-derived discourse markers and tags, analepsis in responsive actions, and ellipsis in first actions, such as requests and instructions. The study draws from transcripts and audio- and video-recordings of naturally occurring interaction in German from the Research and Teaching Corpus of Spoken German (FOLK).
Lexikonprojektion und Konstruktion: Experimentelle Studien zu Argumentalternationen im Deutschen
(2020)
Debates on lexicalist vs. constructionist modelling of argument alternations are typically based on data from single constructions, each including different types of verbs. Evidence from constructions with an identical set of verb types that systematically differ in their meaning is lacking, even though such evidence is imperative for specifically investigating the dependence of argument alternations on the interaction between construction and lexical meanings. We present two acceptability studies where verb lexeme meanings and constructions - specifically active voice, impersonal passive and the construction with man 'one' in German - vary systematically. Prima facie our results support a constructionist explanation, because each construction exhibits a unique acceptability cline. However, across constructions, an adequate explanation has to consider verb-based lexical meanings. The most plausible explanation is that the semantic features licensed by the construction are matched with the semantic features provided by the verb lexeme.
Objekte der Begeisterung
(2020)
We present a construction-based approach to German prepositional object (I’O) constructions occurring with the verb begeistern ,to thrill'. Traditionally, the preposition in such structures is analysed as a meaningless object marker that is lexically selected by the governing verb and not subject to variation. Drawing on a corpus study in the German reference corpus DeReKo, we show that our target verb occurs with four different PO prepositions (für ,lor‘,« ׳? ,at', von ,front' and über ,over‘) that can be analysed as markers o f schematic argument structure constructions in the Construction Grammar sense. We show that each construction comes with its own meaning and semantically coherent predicate restrictions. We argue that purely valency-based (lexical) approaches to argument structure fail to capture these generalisations. On the other hand, purely schema-based (constructionist) approaches to argument structure face the complcmentary problem o f accommodating item-specific restrictions and exceptions to the generalisations they embody. We suggest that the necessary synthesis can be formulated within an account that recognises both generalised constructions and item-specific valency properties.
Paronymie und Sprachwandel
(2020)
Dieser Beitrag geht der Frage nach, welche Faktoren beim Bedeutungswandel von deutschen Paronymen (z. B. effektiv/effizient, virtuell/virtual, nicht ehelich/unehelich/außerehelich) eine Rolle spielen und wie sich diese im aktuellen Sprachgebrauch zeigen. Dabei können gerade Korpusanalysen unterschiedliche Tendenzen sprachlicher Entwicklung aufdecken. Als morphologische Alternativen können Paronyme durchaus das sprachliche Inventar bereichern und der Sprachgemeinschaft neue lexikalische Varianten zur Verfügung stellen. In anderen Fällen konkurrieren Paronyme stark miteinander und dadurch verändern sich Verwendungsweisen. Zusätzlich ist häufiger fehlerhafter Gebrauch ein wichtiger Impuls für semantische Veränderungen. Als Ergebnis beobachten wir semantische Angleichungen oder lexikalische Verdrängungen. Zahlreiche Ausdrücke haben sich in der jüngsten Sprachgeschichte semantisch, stilistisch oder diskursiv spezialisiert, um veränderten sprachlichen Bedürfnissen sowie neuen kommunikativen Situationen Rechnung zu tragen. Die Ursachen und Folgen des Wandels von paronymen Zweifelsfällen sind vielschichtig. In diesem Beitrag werden einige konkrete Ausdrücke näher beleuchtet, ihre gebrauchsorientierte Untersuchung, aber auch Möglichkeiten der lexikografischen Dokumentation werden erörtert.
This paper studies practices of indexing discrepant assumptions accomplished by turn-constructional units with ich dachte ('I thought') in German talk-in-interaction. Building on the analysis of 141 instances from the corpus FOLK, we identify three sequential environments in which ich dachte is used to index that an assumption which a speaker (has) held contrasts with some other, contextually salient assumption. We show that practices which have been studied for English I thought are also routinely used in German: ich dachte is a means to manage epistemic incongruencies and to contrast an incorrect with a correct assumption in narratives. In addition, ich dachte is also used to account for the speaker's own prior actions which may have looked problematic because they built on misunderstandings which the speaker only discovered later. Moreover, ich dachte-practices may also be used to create comic effects by reporting an earlier, absurd assumption. The practices are discussed with regard to their role in regaining common ground, in managing relationships, in maintaining the identity of a rational actor, and in terms of their exploitation for other conversational interests. Special attention is paid to how co-occurring linguistic features, and sequential and pragmatic factors, account for local interpretations of ich dachte.
Privacy by Design (also referred to as Data Protection by Design) is an approach in which solutions and mechanisms addressing privacy and data protection are embedded through the entire project lifecycle, from the early design stage, rather than just added as an additional layer to the final product. Formulated in the 1990 by the Privacy Commissionner of Ontario, the principle of Privacy by Design has been discussed by institutions and policymakers on both sides of the Atlantic, and mentioned already in the 1995 EU Data Protection Directive (95/46/EC). More recently, Privacy by Design was introduced as one of the requirements of the General Data Protection Regulation (GDPR), obliging data controllers to define and adopt, already at the conception phase, appropriate measures and safeguards to implement data protection principles and protect the rights of the data subject. Failing to meet this obligation may result in a hefty fine, as it was the case in the Uniontrad decision by the French Data Protection Authority (CNIL). The ambition of the proposed paper is to analyse the practical meaning of Privacy by Design in the context of Language Resources, and propose measures and safeguards that can be implemented by the community to ensure respect of this principle.
Der Beitrag beschäftigt sich mit kommunikativen Praktiken in audiovisuellen Webformaten am Beispiel von sogenannten „Let’s Plays“, in denen ein Videospiel im Internet für Zuschauende gespielt und kommentiert wird. An live ausgestrahlten Let’s Plays zeigen wir, wie Zuschauende mit Produzierenden während der Ausstrahlung interagieren und so integraler Bestandteil des entstehenden Produkts werden. Live ausgestrahlte Let’s Plays machen eine Trennung zwischen Produktion, Produkt und Rezeption, wie wir sie von traditionellen Medien kennen, obsolet. Wir sprechen daher von sogenannten Medienketten. Sie zeichnen sich dadurch aus, dass die drei genannten Elemente, aufgrund der gegebenen medialen Affordanzen ineinander übergehen, sich dynamisch beeinflussen oder gegenseitig hervorbringen.
As immigration and mobility increases, so do interactions between people from different linguistic backgrounds. Yet while linguistic diversity offers many benefits, it also comes with a number of challenges. In seven empirical articles and one commentary, this Special Issue addresses some of the most significant language challenges facing researchers in the 21st century: the power language has to form and perpetuate stereotypes, the contribution language makes to intersectional identities, and the role of language in shaping intergroup relations. By presenting work that aims to shed light on some of these issues, the goal of this Special Issue is to (a) highlight language as integral to social processes and (b) inspire researchers to address the challenges we face. To keep pace with the world’s constantly evolving linguistic landscape, it is essential that we make progress toward harnessing language’s power in ways that benefit 21st century globalized societies.
Historiquement, les variétés germaniques de la Moselle-Est (ancienne région Lorraine) font partie du continuum dialectal de l’allemand. Après la Seconde Guerre mondiale, leur utilisation (y compris celle de l’allemand standard) a été fortement réprimée et la francisation résolument poursuivie. Depuis quelques décennies maintenant, des efforts ont été faits pour élever les dialectes de la Moselle-Est au statut de langue indépendante afin de marquer une distance par rapport à la langue allemande, de permettre leur identification et de pouvoir les réutiliser. Le paysage linguistique donne une bonne indication de la manière dont coexistent les différents groupes linguistiques et une indication sur le statut de leurs langues. Dans le cadre d’une analyse qualitative, les contextes d’apparition, les fonctions et les auteurs des éléments linguistiques visibles dans l’espace public en allemand standard et dialectal seront discutés pour la Moselle-Est. Il s’avère qu’ils constituent des exceptions notables, distribuées de manière significative. L’allemand (standard) apparaît dans les inscriptions historiques ainsi que dans le domaine des relations internationales, et est donc implicitement exogénéisé. En revanche, on trouve le dialecte (appelé « platt ») dans des contextes ayant des références locales et portant sur des aspects identitaires.
Die Korpusanalyseplattform KorAP wird als Nachfolgesystem zu COSMAS II am Leibniz-Institut für Deutsche Sprache (IDS) entwickelt und erlaubt einen umfassenden Zugriff auf einen Teil von DeReKo (Kupietz et al. 2010). Trotz einiger noch fehlender Funktionalitäten ist KorAP bereits produktiv einsetzbar. Im Folgenden wollen wir am Beispiel der Untersuchung von Social-Media-Korpora einige neue Möglichkeiten und Besonderheiten vorstellen.
As part of a larger research paradigm on understanding client change in the helping professions from an interprofessional perspective, this paper applies a conversation analytic approach to investigate therapists’ requesting examples (REs) and their interactional and sequential contribution to clients’ change during the diagnostic evaluation process. The analyzed data comprises 15 videotaped intake interviews that followed the system of Operationalized Psychodynamic Diagnosis. Therapists’ requesting examples in psychodiagnostic interviews explicitly or implicitly criticize the patient’s prior turn as insufficient. They also open a retro-sequence and in the following turns provide for a description that helps clarify meaning and evince psychic or relational aspects of the topic at hand. While the therapist’s prior request initiates the patient’s insufficient presentation, the patient’s example presentation is regularly followed by the therapist’s summarizing comments or by further requests. Requesting examples thus are a particular case of requests that follow expandable responses regarding the sequential organization; yet, given that they make examples conditionally relevant, they are more specific. With the help of this sequential organization, participants co-construct common knowledge which allows the therapist to pursue the overall aim of therapy, which is to increase the patients’ awareness of their distorted perceptions, and thus to pave the way for change.
Making corpora accessible and usable for linguistic research is a huge challenge in view of (too) big data, legal issues and a rapidly evolving methodology. This does not only affect the design of user-friendly graphical interfaces to corpus analysis tools, but also the availability of programming interfaces supporting access to the functionality of these tools from various analysis and development environments. RKorAPClient is a new research tool in the form of an R package that interacts with the Web API of the corpus analysis platform KorAP, which provides access to large annotated corpora, including the German reference corpus DeReKo with 45 billion tokens. In addition to optionally authenticated KorAP API access, RKorAPClient provides further processing and visualization features to simplify common corpus analysis tasks. This paper introduces the basic functionality of RKorAPClient and exemplifies various analysis tasks based on DeReKo, that are bundled within the R package and can serve as a basic framework for advanced analysis and visualization approaches.
Sprachentwicklungstest zum Kasus bei bilingualen Vorschulkindern: Sprachstand Deutsch (KT-DEU)
(2020)
Sprachentwicklungstest zum Kasus bei den bilingualen Vorschulkindern: Sprachstand Russisch (KT-RUS)
(2020)
Studenten, StudentInnen, Studierende? Aktuelle Verwendungspräferenzen bei Personenbezeichnungen
(2020)
Im Beitrag werden Meinungen und Einstellungen zur geschlechtergerechten Sprache dargestellt. Dazu werden verschiedene Möglichkeiten für die Bezeichnung von Personen, die studieren, in den Blick genommen. Diese werden zunächst beschrieben und ihre Frequenzen im Deutschen Referenzkorpus ausgewertet. Anschließend werden explizit die Meinungen und Einstellungen behandelt. Dafür werden die Daten der Deutschland-Erhebung 2008 und der Deutschland-Erhebung 2017 ausgewertet. In der aktuellen Erhebung wurden laienlinguistische Verwendungspräferenzen von Personenbezeichnungen erhoben; präferiert wird von den meisten Befragten die Partizipialform (den Studierenden). Die Verwendungspräferenzen hangen vor allem mit dem Alter der Befragten und ihrer politischen Orientierung zusammen. Insgesamt zeigt sich jedoch, dass das Thema der geschlechtergerechten Sprache für die meisten Befragten nur eine untergeordnete Rolle spielt.
T-Shirt Lexicography
(2020)
This article presents a study of graphic inscriptions on garments such as T-shirts, inscriptions that resemble entries in general monolingual dictionaries of German. Referred to here as "T-shirt lexicography," the collected material is analyzed in terms of its form, content, and function, focusing on lexicographical aspects. T-shirt lexicography is an example of vernacular lexicography inasmuch as different lexicographical traditions are assumed (correctly as well as erroneously) by the (unknown) authors, but also adapted to their specific needs.
Text und Sprache digital
(2020)
This article makes an empirical and a methodological contribution to the comparative study of action. The empirical contribution is a comparative study of three distinct types of action regularly accomplished with the turn format du meinst x (“you mean/think x”) in German: candidate understandings, formulations of the other’s mind, and requests for a judgment. These empirical materials are the basis for a methodological exploration of different levels of researcher abstraction in the comparative study of action. Two levels are examined: the (coarser) level of conditionally relevant responses (what a response speaker must do to align with the action of the prior turn) and the (finer) level of “full alignment” (what a response speaker can do to align with the action of a prior turn). Both levels of abstraction provide empirically viable and analytically interesting descriptive concepts for the comparative study of action. Data are in German.
For a long time, the lecture dominated performatively presented scientific communication. Given academic traditions, it is possible to make a connection between the lecture and classical rhetoric, a highly differentiated instrument of analysis. The tradition of the lecture has been perpetuated in the presentation of research results, first in the use of transparencies and subsequently through computer-based projections. Yet the use of media technology has also allowed new practices to emerge, including mediation practices hitherto neglected in the theory of rhetoric.
The lexicography of German
(2020)
This chapter discusses the main dictionaries of the German language as it is spoken and written in Germany, and also German as it is spoken and written in Austria, Switzerland, the eastern fringes of Belgium, and South Tyrol. It also briefly describes Pennsylvania German. Corpora and other language resources used in German dictionary-making are also presented. Finally, there is a discussion of some current issues in German lexicography, as well as future prospects.
This paper describes the development of a systematic approach to the creation, management and curation of linguistic resources, particularly spoken language corpora. It also presents first steps towards a framework for continuous quality control to be used within external research projects by non-technical users, and discuss various domain and discipline specific problems and individual solutions. The creation of spoken language corpora is not only a time-consuming and costly process, but the created resources often represent intangible cultural heritage, containing recordings of, for example, extinct languages or historical events. Since high quality resources are needed to enable re-use in as many future contexts as possible, researchers need to be provided with the necessary means for quality control. We believe that this includes methods and tools adapted to Humanities researchers as non-technical users, and that these methods and tools need to be developed to support existing tasks and goals of research projects.
The coronavirus pandemic may be the largest crisis the world has had to face since World War II. It does not come as a surprise that it is also having an impact on language as our primary communication tool. In this short paper, we present three inter-connected resources that are designed to capture and illustrate these effects on a subset of the German language: An RSS corpus of German-language newsfeeds (with freely available untruncated frequency lists), a continuously updated HTML page tracking the diversity of the vocabulary in the RSS corpus and a Shiny web application that enables other researchers and the broader public to explore the corpus in terms of basic frequencies.
The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
In our paper, we present a case study on the quality of concept relations in the manually developed terminological resource of grammis, an information system on German grammar. We assess a SKOS representation of the resource using the tool qSKOS, create a typology of the issues identified by the tool, and conduct a qualitative analysis of selected cases. We identify and discuss aspects that can motivate quality issues and uncover that ill-formed relations are frequently indicative of deeper issues in the data model. Finally, we outline how these findings can inform improvements in our resource’s data model, discussing implications for the machine readability of terminological data.
Das Theonym Gott für den christlichen Gott weist im Frühneuhochdeutschen eine Reihe ungewöhnlicher grammatischer Eigenschaften auf, die in diesem Beitrag korpusbasiert untersucht werden. Zum einen hat es sich von seiner appellativischen Herkunft emanzipiert, wie beispielsweise am fehlenden Artikel deutlich wird, zum anderen nutzt es aber das für einen Namen ungewöhnliche es-Flexiv im Genitiv (Pauls, Gottes) und tritt, wie unbelebte Appellative, als Genitivattribut dominant nachgestellt auf (Haus __ Gottes). In der Schreibung bildet sich die Doppelmajuskel <GOtt> heraus, die es bis ins 18. Jh. visuell von der übrigen Lexik abhebt. Damit weist das Theonym im Frühneuhochdeutschen eine Sondergrammatik auf, in abgeschwächter Form besteht sie bis heute fort. Der Beitrag argumentiert dafür, dass es sich um ein Resultat besonderer kommunikativer Relevanz handelt.
Aus diesem Grunde haben wir uns empirisch der Frage genähert, wie oder ob bestimmte Gruppen heute überhaupt noch Wörterbücher nutzen und ob sie sie bewusst von anderen sprachbezogenen Daten im Web unterscheiden. Es sollten empirische Daten gesammelt werden, um zu erfahren, wie DaF-Lernende tatsächlich arbeiten (und nicht was sie dazu retrospektiv sagen), vor allem um eine bessere empirische Basis für den Unterricht zur Verfügung zu haben. Zentrale Fragen dabei waren:
• Wie nutzen DaF-Lernende heutzutage lexikografische Ressourcen?
• Welche Suchstrategien wenden sie an?
• Differenzieren sie zwischen den unterschiedlichen Ressourcen?
• Welche Strategien erweisen sich als besonders erfolgreich?
Terminologiearbeit im wirtschaftlichen Kontext geht von zwei Arbeitsphasen aus: einer umfassenden deskriptiven Phase, in der die Begriffsstruktur und der aktuelle Terminologiegebrauch erfasst, aber noch nicht bewertet werden, sowie einer präskriptiven Phase, in der der eigentliche Standardisierungseingriff erfolgt. In der Praxis wird die deskriptive Phase oft reduziert und der Schwerpunkt unmittelbar auf die Präskription gelegt. In unserem Beitrag diskutieren wir das Potenzial, das eine ausführliche deskriptive Terminologiearbeit zur Verbesserung der Wissenskommunikation im Rahmen des Wissensmanagements birgt. Am Beispiel eines wissenschaftlichen Projektes im Bereich Grammatik des Deutschen zeigen wir, wie diese eng an der Theorie orientierte Ausgestaltung der Deskription in der Praxis aussieht, welche Herausforderungen sie mit sich bringt und wie ihre Ergebnisse das Wissensmanagement unterstützen können.
In der deutschsprachigen Gender-Mainstreaming-Debatte treten sprachpolitische Positionen in Konflikt mit grammatischen Regularitäten und orthografischen Normen – nicht selten ohne wesentliche Annäherung. Der Beitrag beleuchtet die Debatte aus der Perspektive des Rats für deutsche Rechtschreibung und argumentiert anhand paradigmatischer Textbeispiele aus dem aktuellen Schreibgebrauch für eine textsorten- und zielgruppenspezifische Realisierung geschlechtergerechter Schreibung. Ausgehend vom breiten Spektrum entsprechender Strategien in bisherigen Leitfäden, Richtlinien und Empfehlungen werden Möglichkeiten einer orthografisch korrekten und sprachlich angemessenen Umsetzung aufgezeigt – in einem multiperspektivischen Ausgleichsversuch beider Diskurspole: Gendergerechte Texte sollen sachlich korrekt, verständlich, lesbar und vorlesbar sein, Rechtssicherheit und Eindeutigkeit gewährleisten sowie die Konzentration auf wesentliche Sachverhalte und Kerninformationen sicherstellen. Abschließend wird diskutiert, welche Rolle der Rat vor dem Hintergrund seines Auftrags der Bewahrung der Einheitlichkeit der Orthografie im gesamten deutschen Sprachraum in der Debatte einnehmen könnte und sollte.
Mit der Tagung zu Bauernkomödien des 17. Jahrhunderts verfolgten Markus Denkler (Münster) und Michael Elmentaler (Kiel) ein ungewöhnliches Konzept, das einen besonders intensiven wissenschaftlichen Austausch ermöglichte: Gemeinsame Textgrundlage für alle Beitragenden stellten zwölf hoch- und niederdeutsche Bauernkomödien aus dem 17. Jahrhundert (ca. 1593–1701) dar. Dabei handelt es sich um Dramen mit bäuerlichen Figuren, die eine komödiantische Ausrichtung haben und in Prosaform verfasst sind. Alle Vortragenden erhielten im Vorfeld Zugriff auf die Sammlung und entwickelten daraus in der Folge Fragestellungen für ihre Vorträge. Inhaltlich ergaben sich drei Blöcke. Zwei literaturwissenschaftliche Beiträge ordneten die Textsorte literatur- und kulturhistorisch ein. Daran schlossen sich ein umfangreicher Block zur historischen Dialogforschung und Pragmatik und ein etwas kürzerer zu historischer Varietätenlinguistik und Grammatik an.