Refine
Year of publication
- 2020 (114) (remove)
Document Type
- Article (61)
- Part of a Book (19)
- Conference Proceeding (15)
- Other (8)
- Book (5)
- Part of Periodical (4)
- Report (1)
- Working Paper (1)
Is part of the Bibliography
- yes (114) (remove)
Keywords
- COVID-19 (31)
- Korpus <Linguistik> (28)
- Neologismus (23)
- Deutsch (22)
- Sprachgebrauch (19)
- Forschungsdaten (14)
- Wortschatz (13)
- Gesprochene Sprache (12)
- Lexikostatistik (12)
- Worthäufigkeit (12)
Publicationstate
- Veröffentlichungsversion (114) (remove)
Reviewstate
- Peer-Review (47)
- (Verlags)-Lektorat (35)
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (41)
- CLARIN (6)
- Heidelberg University Publishing (6)
- European Language Resources Association (5)
- Spektrum der Wissenschaft Verlagsgesellschaft (5)
- Verlag für Gesprächsforschung (4)
- Linköping University Electronic Press (3)
- Association for Computational Linguistics (2)
- Erich Schmidt (2)
- Frontiers Media S.A. (2)
Das Kommunizieren in Sozialen Medien und der Umgang mit Hypertexten ist im Jahr 2020 kein Randphänomen mehr. Die sprachlichen Besonderheiten internetbasierter Kommunikation und Sozialer Medien sind mittlerweile auch gut erforscht und beschrieben, allerdings werden diese bislang in deutschen Grammatiken, mit Ausnahme von Hoffmann (2014), allenfalls am Rande behandelt. Selbst neuere Ansätze zur Textanalyse, z. B. Ágel (2017), konzentrieren sich auf gestaltstabile, linear organisierte Schrifttexte. Dasselbe gilt für Ansätze, die primär für die Bewertung von Schreibprodukten in Bildungskontexten entwickelt wurden.
The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.
This paper presents the QUEST project and describes concepts and tools that are being developed within its framework. The goal of the project is to establish quality criteria and curation criteria for annotated audiovisual language data. Building on existing resources developed by the participating institutions earlier, QUEST develops tools that could be used to facilitate and verify adherence to these criteria. An important focus of the project is making these tools accessible for researchers without substantial technical background and helping them produce high-quality data. The main tools we intend to provide are the depositors’ questionnaire and automatic quality assurance, both developed as web applications. They are accompanied by a Knowledge base, which will contain recommendations and descriptions of best practices established in the course of the project. Conceptually, we split linguistic data into three resource classes (data deposits, collections and corpora). The class of a resource defines the strictness of the quality assurance it should undergo. This division is introduced so that too strict quality criteria do not prevent researchers from depositing their data.
The CMDI Explorer
(2020)
We present the CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. The CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).
Signposts for CLARIN
(2020)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold et al. 2020 present Signposts as a solution to challenges in long-term preservation of corpora, especially corpora that are continuously extended and subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution Signposts can make to the CLARIN infrastructure and document the design for the CMDI profile.
In this Paper, we describe a schema and models which have been developed for the representation of corpora of computer-mediated communicatin (CMC corpora) using the representation framework provided by the Text Encoding Initiative (TEI). We characterise CMC discourse as dialogic, sequentially organised interchange between humans and point out that many features of CMC are not adequately handled by current corpus encoding schemas and tools. We formulate desiderata for a representation of CMC in encoding schemes and argue why the TEI is a suitable framework for the encoding of CMC corpora. We propose a model of basic CMC units (utterances, posts, and nonverbal activities) and the macro- and micro-level structures of interactions in CMC environments. Based on these models, we introduce CMC-core, a TEI customisation for the encoding of CMC corpora, which defines CMC-specific encoding features on the four levels of elements, model classes, attribute classes, and modules of the TEI infrastructure. The description of our customisation is illustrated by encoding examples from corpora by researchers of the TEI SIG CMC, representing a variety of CMC genres, i.e. chat, wiki talk, twitter, blog, and Second Life interactions. The material described, i.e. schemata, encoding examples, and documentation, is available from the of the TEI CMC SIG Wiki and will accompany a feature request to the TEI council in late 2019.
Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
(2020)
We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of “scientific language” and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.
Sogenannte „Pragmatikalisierte Mehrworteinheiten“ sind im Deutschen hochfrequent und unterliegen bisweilen tiefgreifenden phonetischen Reduktionsprozessen. Diese können Realisierungsvarianten hervorbringen, die in der Rückschau auf mehr als eine lexematische Ursprungsform zurückführbar sind. Die vorliegende Studie untersucht mit [ˈzɐmɐ] einen besonders prägnanten Fall dieser Art anhand eines Perzeptionsexperimentes.
This chapter focuses on the formation of adverbs from a corpuslinguistic perspective, providing an overview of adverb formation patterns in German that includes frequencies and hints to productivity as well as combining quantitative methods and theoretically founded hypotheses to address questions that concern possible grammaticalization paths in domains that are formally marked by prepositional elements or inflectional morphology (in particular, superlative or superlative-derived forms). Within our collection of adverb types from the project corpus, special attention is paid to adverbs built from primary prepositions. The data suggest that generally, such adverb formation involves the saturation of the internal argument slot of the relation-denoting preposition. In morphologically regular formations with the preposition in final position, pronominal forms like da ‘there’, hier ‘here’, wo ‘where’ as well as hin ‘hither’ and her ‘thither’ serve to derive adverbs. On the other hand, morphologically irregular formations with the preposition – in particular: zu ‘to’ or vor ‘before, in front of’ – in initial posi-tion show traits of syntactic origin such as (remnants of) inflectional morphology. The pertaining adverb type dominantly saturates the internal argument slot by means of universal quantification that is part and parcel as well of the derivation of superlatives and demonstrably fuels the productivity of the pertaining formation pattern.
Die vorgestellte Studie untersucht die Anteile unterschiedlicher Redewiedergabeformen im Vergleich zwischen zwei Literaturtypen von gegensätzlichen Enden des Spektrums: Hochliteratur – definiert als Werke, die auf der Auswahlliste von Literaturpreisen standen – und Heftromanen, massenproduzierten Erzählwerken, die zumeist über den Zeitschriftenhandel vertrieben werden und früher abwertend als „Romane der Unterschicht” (Nusser 1981) bezeichnet wurden. Unsere These ist, dass sich diese Literaturtypen hinsichtlich ihrer Erzählweise unterscheiden, und sich dies in den verwendeten Wiedergabeformen niederschlägt. Der Fokus der Untersuchung liegt auf der Dichotomie zwischen direkter und nicht-direkter Wiedergabe, die schon in der klassischen Rhetorik aufgemacht wurde.
We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76) and free indirect (F1=0.59) STWR. For reported STWR, the comparison was inconclusive, but BERT gave the best average results and best individual model (F1=0.60). Our best recognizers, our customized language embeddings and most of our test and training data are freely available and can be found via www.redewiedergabe.de or at github.com/redewiedergabe.
According to Positioning Theory, participants in narrative interaction can position themselves on a representational level concerning the autobiographical, told self, and a performative level concerning the interactive and emotional self of the tellers. The performative self is usually much harder to pin down, because it is a non-propositional, enacted self. In contrast to everyday interaction, psychotherapists regularly topicalize the performative self explicitly. In our paper, we study how therapists respond to clients' narratives by interpretations of the client's conduct, shifting from the autobiographical identity of the told self, which is the focus of the client's story, to the present performative self of the client. Drawing on video recordings from three psychodynamic therapies (tiefenpsychologisch fundierte Psychotherapie) with 25 sessions each, we will analyze in detail five extracts of therapists' shifts from the representational to the performative self. We highlight four findings:
• Whereas, clients' narratives often serve to support identity claims in terms of personal psychological and moral characteristics, therapists rather tend to focus on clients' feelings, motives, current behavior, and ways of interacting.
• In response to clients' stories, therapists first show empathy and confirm clients' accounts, before shifting to clients' performative self.
• Therapists ground the shift to clients' performative self by references to clients' observable behavior.
• Therapists do not simply expect affiliation with their views on clients' performative self. Rather, they use such shifts to promote the clients' self-exploration. Yet, if clients resist to explore their selves in more detail, therapists more explicitly ascribe motives and feelings that clients do not seem to be aware of. The shift in positioning levels thus seems to have a preparatory function for engendering therapeutic insights.
This article examines the language contact situation as well as the language attitudes of the Caucasian Germans, descendants of German-born inhabitants of the Russian Empire and the Soviet Union who emigrated in 1816/17 to areas of Transcaucasia. After deportations and migrations, the group of Caucasian Germans now consists of those who have since emigrated to Germany and those who still live in the South Caucasus. It’s the first time that sociolinguistic methods have been used to record data from the generation who experienced living in the South Caucasus and in Germany as well as from two succeeding generations. Initial results will be presented below with a focus on the language contact constellations of German varieties as well as on consequences of language contact and language repression, which both affect language attitudes.
Individuals with Autism Spectrum Disorder (ASD) experience a variety of symptoms sometimes including atypicalities in language use. The study explored diferences in semantic network organisation of adults with ASD without intellectual impairment. We assessed clusters and switches in verbal fuency tasks (‘animals’, ‘human feature’, ‘verbs’, ‘r-words’) via curve ftting in combination with corpus-driven analysis of semantic relatedness and evaluated socio-emotional and motor action related content. Compared to participants without ASD (n=39), participants with ASD (n=32) tended to produce smaller clusters, longer switches, and fewer words in semantic conditions (no p values survived Bonferroni-correction), whereas relatedness and content were similar. In ASD, semantic networks underlying cluster formation appeared comparably small without afecting strength of associations or content.
We evaluate a graph-based dependency parser on DeReKo, a large corpus of contemporary German. The dependency parser is trained on the German dataset from the SPMRL 2014 Shared Task which contains text from the news domain, whereas DeReKo also covers other domains including fiction, science, and technology. To avoid the need for costly manual annotation of the corpus, we use the parser’s probability estimates for unlabeled and labeled attachment as main evaluation criterion. We show that these probability estimates are highly correlated with the actual attachment scores on a manually annotated test set. On this basis, we compare estimated parsing scores for the individual domains in DeReKo, and show that the scores decrease with increasing distance of a domain to the training corpus.
Dieser Beitrag beschreibt, welche Schritte nötig sind, um die Daten des Archivs der Grafen v. Platen (AGP) für Forschungsdateninfrastrukturen (FDI) zugänglich zu machen: die Daten konvertieren, die Metadaten extrahieren, Daten und Metadaten indizieren sowie die Datenmodelle für Daten und Metadaten so ergänzen, dass sie die Bestände des Archivs sinnvoll erfassen. Zugleich wird begründet, weshalb man überhaupt solchen Aufwand treiben sollte: nämlich, damit die Daten einem größeren Publikum zur Verfügung stehen und überdies mit Werkzeugen bearbeitet werden können, die in den Infrastrukturen zur Verfügung stehen, und damit eine weitere Verlinkung und Kombination mit externen Ressourcen erfolgen kann, sodass ein deutlicher Mehrwert entstehen kann.
We present web services which implement a workflow for transcripts of spoken language following the TEI guidelines, in particular ISO 24624:2016 “Language resource management – Transcription of spoken language”. The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.
Coaching outcome research convincingly argues that coaching is effective and facilitates change in clients. While coaching practice literature depicts questions as key vehicle for such change, empirical findings as regards the local and global change potential of questions are so far largely missing in both (psychological) outcome research and (linguistic and psychological) process research on coaching. The local change potential of questions refers to a turn-by-turn transformation as a result of their sequentiality, the global change potential is related to the power of questions to initiate, process and finalize established phases of change. This programmatic article on questions, or rather questioning sequences, in executive coaching pursues two goals: firstly, it takes stock of available insights into questions in coaching and advocates for Conversation Analysis as a fruitful methodological framework to assess the local change potential of questioning sequences. Secondly, it points to the limitations of a local turn-by-turn approach to unravel the overall change potential of questions and calls for an interdisciplinary approach to bring both local and global effectiveness into relation. Such an approach is premised on conversational sequentiality and psychological theories of change and facilitates research on questioning sequences as both local and global agents of change across the continuum of coaching sessions. We present the TSPP Model as a first result of such an interdisciplinary cooperation.
This paper analyses the variation we find in the realization of finite clausal complements in the position of prepositional objects in a set of Germanic languages. The Germanic languages differ with respect to whether prepositions can directly select a clause (North Germanic) or not and instead need a prepositional proform (Continental West Germanic). Within the Continental West Germanic languages, we find further differences with respect to the constituent structures. We propose that German strong vs. weak prepositional proforms (e.g. drauf vs. darauf) differ with respect to their syntax, while this is not the case for the Dutch forms (ervan vs. daarvan). What the Germanic languages under consideration share is that the prepositional element can be covert, except in English. English shows only limited evidence for the presence of P with finite clauses in the position of prepositional objects generally, but only with a selected set of verbs. This investigation is a first step towards a broader study of the nature of clauses in prepositional object positions and the implications for the syntax of clausal complementation.
Twenty-two historical encyclopedias encoded in TEI: a new resource for the Digital Humanities
(2020)
This paper accompanies the corpus publication of EncycNet, a novel XML/TEI annotated corpus of 22 historical German encyclopedias from the early 18th to early 20th century. We describe the creation and annotation of the corpus, including the rationale for its development, suggested methodology for TEI annotation, possible use cases and future work. While many well-developed annotation standards for lexical resources exist, none can adequately model the encyclopedias at hand, and we therefore suggest how the TEI Lex-0 standard may be modified with additional guidelines for the annotation of historical encyclopedias. As the digitization and annotation of historical encyclopedias are settling on TEI as the de facto standard, our methodology may inform similar projects.
Using video-recordings from one day of a theater project for young adults, this paper investigates how the meaning of novel verbal expressions is interactionally constituted and elaborated over the interactional history of a series of activities. We examine how the theater director introduces and instructs the group in the Chekhovian technique of acting, which is based on “imagining with the body,” and how the imaginary elements of the technique are “brought into existence” in the language of the instructions. By tracking shifts in the instructor’s use of the key expressions invisible/imaginary/inner body or movement through a series of exercises, we demonstrate how they are increasingly treated as real and perceivable bodily conduct. The analyses focus on the instructor’s attribution of factual and agentive properties to these expressions, and the changes that these properties undergo over the series of instructions. This case demonstrates the significance of longitudinal processes for the establishment of shared meaning in social interaction. The study thereby contributes to the field of interactional semantics and to longitudinal studies of social interaction.
This article describes the development of the digital infrastructure at a research data centre for audio-visual linguistic research data, the Hamburg Centre for Language Corpora (HZSK) at the University of Hamburg in Germany, over the past ten years. The typical resource hosted in the HZSK Repository, the core component of the infrastructure, is a collection of recordings with time-aligned transcripts and additional contextual data, a spoken language corpus. Since the centre has a thematic focus on multilingualism and linguistic diversity and provides its service to researchers within linguistics and other disciplines, the development of the infrastructure was driven by diverse usage scenarios and user needs on the one hand, and by the common technical requirements for certified service centres of the CLARIN infrastructure on the other. Beyond the technical details, the article also aims to be a contribution to the discussion on responsibilities and services within emerging digital research data infrastructures and the fundamental issues in sustainability of research software engineering, concluding that in order to truly cater to user needs across the research data lifecycle, we still need to bridge the gap between discipline-specific research methods in the process of digitalisation and generic digital research data management approaches.
Towards Comprehensive Definitions of Data Quality for Audiovisual Annotated Language Resources
(2020)
Though digital infrastructures such as CLARIN have been successfully established and now provide large collections of digital resources, the lack of widely accepted standards for data quality and documentation still makes re-use of research data a difficult endeavour, especially for more complex resource types. The article gives a detailed overview over relevant characteristics of audiovisual annotated language resources and reviews possible approaches to data quality in terms of their suitability for the current context. Conclusively, various strategies are suggested in order to arrive at comprehensive and adequate definitions of data quality for this particular resource type.
Vorwort
(2020)
N-grams are of utmost importance for modern linguistics and language theory. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of newspaper publishers) also provide interesting arguments in this debate. The proposed paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.
Providing online repositories for language resources is one of the main activities of CLARIN centres. The legal framework regarding liability of Service Providers for content uploaded by their users has recently been modified by the new Directive on Copyright in the Digital Single Market. A new category of Service Providers, Online Content-Sharing Service Providers (OCSSPs), was added. It is subject to a complex and strict framework, including the requirement to obtain licenses from rightholders for the hosted content. This paper provides the background and effect of these changes to law and aims to initiate a debate on how CLARIN repositories should navigate this new legal landscape.
CLARIN contractual framework for sharing language data: the perspective of personal data protection
(2020)
The article analyses the responsibility for ensuring compliance with the General Data Protection Regulation (GDPR) in research settings. As a general rule, organisations are considered the data controller (responsible party for the GDPR compliance). Research constitutes a unique setting influenced by academic freedom. This raises the question of whether academics could be considered the controller as well. However, there are some court cases and policy documents on this issue. It is not settled yet. The analysis serves a preliminary analytical background for redesigning CLARIN contractual framework for sharing data.
Maske oder Mundschutz?
(2020)
Shutdown, Lockdown und Exit
(2020)
Von Nichtstun und Erholung (an Weihnachten und zu anderen Zeiten) (aus der Rubrik Neuer Wortschatz)
(2020)
Von Gummistiefelmomenten
(2020)
Corona- und andere Partys
(2020)
Einleitung
(2020)
A corpus-based academic grammar of German is an enormous undertaking, especially if it aims at using state-of-the-art methodology while ensuring that its study results are verifiable. The Bausteine-series, which is being developed at the Leibniz Institute for the German Language (IDS), presents individual “building blocks” for such a grammar. In addition to the peer-reviewed texts, the series publishes the results of statistical analyses and, for selected topics, the underlying data sets.
This chapter begins with a sketch of the specifics of our approach, an overview of the contents of the chapters on word formation and some methodological notes. It then discusses the general characteristics of word formations and of their overall inventory, comparing word formations to primary words. Furthermore, the chapter explores the relative frequencies of word formations in different vocabulary areas and traces the word formation profiles of individual parts of speech. Finally, it compiles the characteristic word formation rules for different parts of speech.
Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size
(2020)
Recently, it was demonstrated that generalized entropies of order α offer novel and important opportunities to quantify the similarity of symbol sequences where α is a free parameter. Varying this parameter makes it possible to magnify differences between different texts at specific scales of the corresponding word frequency spectrum. For the analysis of the statistical properties of natural languages, this is especially interesting, because textual data are characterized by Zipf’s law, i.e., there are very few word types that occur very often (e.g., function words expressing grammatical relationships) and many word types with a very low frequency (e.g., content words carrying most of the meaning of a sentence). Here, this approach is systematically and empirically studied by analyzing the lexical dynamics of the German weekly news magazine Der Spiegel (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017). We show that, analogous to most other measures in quantitative linguistics, similarity measures based on generalized entropies depend heavily on the sample size (i.e., text length). We argue that this makes it difficult to quantify lexical dynamics and language change and show that standard sampling approaches do not solve this problem. We discuss the consequences of the results for the statistical analysis of languages.
This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.
This article explores a sequence organizational phenomenon that results from the use of a loosely specifiable turn format (viz., That’s + wh-clause) for launching (next) sequences while at the same time connecting back to a prior turn. Using this practice creates a sequential juncture, i.e., a pivot-like nexus between one sequence and a next. In third position, such junctures serve to accomplish seamless sequential transitions from one sequence into a next by presenting the latter as locally occasioned. The practice may, however, also be deployed in second position to launch actions that have not been made relevant or provided for by the preceding action and exhibit response relevance themselves. The sequential junctures then become retro-sequential in character: They transform the projected trajectory of the sequence in progress and create interlocking sequential structures. These findings highlight that sequence is practice, while pointing to understudied interconnections between tying and sequentiality. Data are in English.
Sprachkämpfe gibt es so manche, aber wer hätte gedacht, dass ausgerechnet das Erscheinen der 28. Auflage des Rechtschreibdudens die Gemüter so in Wallung versetzen würde, dass gleich mehrere davon in die nächste Runde gehen. Verlag und Redaktion werden auf die sprachpolitische Bühne gezerrt, weil man die deutsche Sprache so gut für Zwecke identitärer Politik instrumentalisieren kann.
„Revolutionen sind die Lokomotiven der Geschichte“, lautet ein berühmter Ausspruch von Karl Marx. Kann man dies auch auf die Sprachgeschichte übertragen? Und was sind deren Lokomotiven? Eine neuere These besagt, dass Pandemien, Kriege und andere “revolutionäre” Ereignisse mit starker Auswirkung auf die Demografie sprachhistorisches Geschehen in Gang setzen können.
Die Sprachpolitik der AfD
(2020)
Sprachpolitik hat sich in den letzten Jahren als ein lohnendes Politikfeld etabliert. Im Umfeld der AfD und in der parlamentarischen Repräsentanz der Partei werden durch Aufrufe, Anträge, Anfragen und Gesetzesinitiativen verschiedene Themen adressiert, die schon im AfD-Grundsatzprogramm von 2016 gesetzt wurden. Um was für sprachpolitische Positionen handelt es sich, und was ist der Grund für das Interesse an diesen Themen?
Nachruf auf Ulrich Engel
(2020)
In diesem Beitrag werden exemplarisch verschiedene potenzielle Gebrauchsmuster mit dem deutschen Lemma wissen gesammelt und ihre in der Fachliteratur vorgelegten interaktionslinguistisch-funktionalen Beschreibungen für einen Strukturierungsversuch genutzt. Im Zentrum steht ein multifunktionaler handlungsorientierter Ansatz zur Beschreibung von Interaktion im Gespräch. Der Beitrag greift dabei Überlegungen auf, die im Rahmen des Forschungsprojekts Lexik des gesprochenen Deutsch (= LeGeDe) zur Erstellung einer korpusbasierten lexikogra- fischen Ressource lexikalischer Besonderheiten des gesprochenen Deutsch in der Interaktion thematisiert wurden.
Schlüsselwörter: Muster, Lexik des gesprochenen Deutsch, Interaktion, Internetlexikografie
Im Beitrag steht das LeGeDe-Drittmittelprojekt und der im Laufe der Projektzeit entwickelte korpusbasierte lexikografische Prototyp zu Besonderheiten des gesprochenen Deutsch in der Interaktion im Zentrum der Betrachtung. Die Entwicklung einer lexikografischen Ressource dieser Art knüpft an die vielfältigen Erfahrungen in der Erstellung von korpusbasierten Onlinewörterbüchern (insbesondere am Leibniz-Institut für Deutsche Sprache, Mannheim) und an aktuelle Methoden der korpusbasierten Lexikologie sowie der Interaktionsanalyse an und nimmt als multimedialer Prototyp für die korpusbasierte lexikografische Behandlung von gesprochensprachlichen Phänomenen eine innovative Position in der modernen Onlinelexikografie ein. Der Beitrag befasst sich im Abschnitt zur LeGeDe-Projektpräsentation ausführlich mit projektrelevanten Forschungsfragen, Projektzielen, der empirischen Datengrundlage und empirisch erhobenen Erwartungshaltungen an eine Ressource zum gesprochenen Deutsch. Die Darstellung der komplexen Struktur des LeGeDe-Prototyps wird mit zahlreichen Beispielen illustriert. In Verbindung mit der zentralen Information zur Makro- und Mikrostruktur und den lexikografischen Umtexten werden die vielfältigen Vernetzungs- und Zugriffsstrukturen aufgezeigt. Ergänzend zum abschließenden Fazit liefert der Beitrag in einem Ausblick umfangreiche Vorschläge für die zukünftige lexikografische Arbeit mit gesprochensprachlichen Korpusdaten.
Zwischen den Jahren oder eine Zeit zwischen den Zeiten. Sprachliche Betrachtungen zur "Normalität"
(2020)
"Systemrelevant" - eine sprachwissenschaftliche Betrachtung des Begriffs aus aktuellem Anlass
(2020)
This paper presents the corpus-based lexicographical prototype that was developed within the framework of the project Lexik des gesprochenen Deutsch (=LeGeDe) as a thirdparty funded project. Research results regarding the information offered in dictionaries have shown that there is a necessity for information on spoken lexis and its interactional functions. The resulting LeGeDe-prototype is based on these needs and desiderata and is thus an innovative example for the adequate representation of spoken language in online dictionaries. It is available online since September 2019 (https://www.owid.de/legede/). In the following sections, after first focusing on the presentation of the project’s goals, the data basis, the intended end user, and the applied methods, we will illustrate the microstructure of the prototype and the information provided in a dictionary entry based on the lemma eben. Finally, we will summarize innovative aspects that are important for the implementation of such a resource.
The present chapter investigates the relative order of attributive adjectives in German. Based on corpus data, our results corroborate previous findings that semantics is the most important factor in accounting for adjective order. Going beyond previous studies, we also consider coordinated structures (such as mit [[großem, verwildertem] Garten] ‘with (a) large, overgrown garden’), where both adjectives are of equal rank. While adjective order in embedded structures (mit [ schwierigem [ familiärem Hintergrund ]] ‘with (a) difficult domestic background’) can be predicted rather accurately on semantic grounds, we show that predictions can also be made for coordinated structures, albeit with lower accuracy. Using regression analysis, we examine how semantic factors interact with a number of other explanatory variables.
Usually, weak inflection of an attributive or nominalized adjective occurs if the adjective is preceded by an inflected determiner: mit diesem technischen Aufwand (‘at great technical expense’). Otherwise, the inflection of the adjective is strong: mit technischem Aufwand. Following this rule of thumb, we would expect strong inflection of an adjective following another adjective whenever the determiner is missing: mit hohem technischem Aufwand. But many German speakers opt for a weak dative singular ending -en following the strong ending -em on the first adjective: mit hohem technischen Aufwand. This chapter shows which explanatory variables play a role in this variation within standard German.
The majority of new words in dictionaries are included following a certain period of time during which they have become more frequent in use and established morphosyntactic and orthographic features consistent with the language system they are borrowed into. In case of borrowed new words, inclusion often takes place at a transitional state of assimilation to the language system, where delayed orthographic or phonetic change cannot be ruled out and the differentiation between standard-conforming and non-standard orthographic word forms of a lemma oftentimes depends on the proximity between the writing systems of the donor and the recipient language. Following a brief overview of loan words and their lexicographical description in the Neologismenwörterbuch, a specialized online dictionary for neologisms in contemporary German, this paper presents findings of an investigative case study on dictionary entries for a neologism borrowed from a logographic language system and discusses the potential of a corpus-based description of new loan words.
I’ve got a construction looks funny – representing and recovering non-standard constructions in UD
(2020)
The UD framework defines guidelines for a crosslingual syntactic analysis in the framework of dependency grammar, with the aim of providing a consistent treatment across languages that not only supports multilingual NLP applications but also facilitates typological studies. Until now, the UD framework has mostly focussed on bilexical grammatical relations. In the paper, we propose to add a constructional perspective and discuss several examples of spoken-language constructions that occur in multiple languages and challenge the current use of basic and enhanced UD relations. The examples include cases where the surface relations are deceptive, and syntactic amalgams that either involve unconnected subtrees or structures with multiply-headed dependents. We argue that a unified treatment of constructions across languages will increase the consistency of the UD annotations and thus the quality of the treebanks for linguistic analysis.
Editorial
(2020)
In this article, we describe a user support solution for the digital humanities. As a case study, we show the development of the CLARIN-D Helpdesk from 2013 into the current support solution that has been extended for several other CLARIN-related software and projects and the DARIAH-ERIC. Furthermore, we describe a way towards a common support platform for CLARIAH-DE, which is currently in the final phase. We hope to further expand the help desk in the following years in order to act as a hub for user support and a central knowledge resource for the digital humanities not only in the German, but also in the European area and perhaps at some point worldwide.
In Theaterproben entwickeln Beteiligte gemeinsam eine Inszenierung, die zur Aufführung gebracht wird. Ein wesentliches Mittel dazu ist das Vorspielen von Teilen des Stücks und das anschließende Besprechen. Dies geschieht üblicherweise in Rollenteilung: Die Schauspielenden führen Teile des Stücks vor, während die Regie zuschaut und gegebenenfalls interveniert, woran sich Besprechungen anschließen können. Dieser Teil von Theaterproben, in dem abwechselnd vorgespielt und besprochen wird, haben wir Spielprobe genannt (siehe Einleitung zu diesem Themenheft). Eine wesentliche interaktionsorganisatorische Aufgabe von Spielproben besteht für die Beteiligten darin, Schauspielaktivitäten und Besprechungsaktivitäten miteinander zu verzahnen. Dies geschieht durch Transitionspraktiken, die das Spiel entweder unterbrechen oder wieder eröffnen. Der vorliegende Beitrag untersucht Transitionspraktiken in Spielproben als ein konstitutives Moment ihrer interaktiven Organisation. Fokussiert werden Praktiken, die das Spiel unterbrechen, so genannte Interventionen. Nach einer detaillierten Fallanalyse, die eine prototypische Transition vom Spiel ins Besprechen und zurück ins Spiel veranschaulicht (Kap. 4.1/4.2), widmet sich der Rest des Beitrags der Analyse einer Kollektion von Interventionen. Es zeigt sich, dass Interventionen normativen Orientierungen unterliegen und verwendete Praktiken hinsichtlich verschiedener Dimensionen (etwa Ursache/Grund der Intervention) systematisch variieren.
Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather subordinate role in empirical language research so far - most likely due to the absence of scientifically valid and sustainable resources. The present paper introduces a multiply annotated corpus of German lyrics as a publicly available basis for multidisciplinary research. The resource contains three types of data for the investigation and evaluation of quite distinct phenomena: TEI-compliant song lyrics as primary data, linguistically and literary motivated annotations, and extralinguistic metadata. It promotes empirically/statistically grounded analyses of genre-specific features, systemic-structural correlations and tendencies in the texts of contemporary pop music. The corpus has been stratified into thematic and author-specific archives; the paper presents some basic descriptive statistics, as well as the public online frontend with its built-in evaluation forms and live visualisations.
Nachruf auf Helmut Frosch
(2020)
Der Band leistet eine theoretisch begründete und empirisch validierte Entwicklung einer automatisierten Wortartenannotation (Part-of-Speech-Tagging) für Transkripte spontansprachlicher Daten des Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK), das über die Datenbank für Gesprochenes Deutsch der Forschungsgemeinschaft öffentlich zugänglich ist. Dabei setzt er zwei Schwerpunkte: erstens die theoretische Aufarbeitung von Unterschieden von Transkripten gesprochener Sprache zu schriftsprachlichen Daten in Hinblick auf die Entwicklung eines Tagsets für das gesprochene Deutsch; zweitens die Darstellung der empirischen Arbeitsschritte zur Erstellung des automatisierten Part-of-Speech-Taggings, d. h. die Implementierung und Evaluierung für die Annotation des FOLK-Korpus. Der Band ist eine kritische Reflexion der Wortartentheorien im Spannungsfeld zwischen Theorie und datengeleiteter Arbeit. Er gibt Einblicke über die Korpusaufbereitung von Transkripten gesprochener Sprache und stellt diese in Bezug zu Theorien über die Eigenheiten gesprochener Sprache.