Refine
Year of publication
- 2020 (169) (remove)
Document Type
- Article (91)
- Part of a Book (23)
- Conference Proceeding (20)
- Book (10)
- Other (9)
- Review (7)
- Part of Periodical (4)
- Doctoral Thesis (2)
- Master's Thesis (1)
- Report (1)
Language
- German (110)
- English (57)
- Multiple languages (2)
Keywords
- COVID-19 (41)
- Korpus <Linguistik> (36)
- Deutsch (30)
- Neologismus (25)
- Sprachgebrauch (25)
- Forschungsdaten (16)
- Computerlinguistik (14)
- Gesprochene Sprache (13)
- Wortschatz (13)
- Lexikostatistik (12)
Publicationstate
- Veröffentlichungsversion (169) (remove)
Reviewstate
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (66)
- CLARIN (6)
- Heidelberg University Publishing (6)
- Erich Schmidt (5)
- European Language Resources Association (5)
- Spektrum der Wissenschaft Verlagsgesellschaft (5)
- Association for Computational Linguistics (4)
- Verlag für Gesprächsforschung (4)
- de Gruyter (4)
- Linköping University Electronic Press (3)
"Systemrelevant" - eine sprachwissenschaftliche Betrachtung des Begriffs aus aktuellem Anlass
(2020)
Sogenannte „Pragmatikalisierte Mehrworteinheiten“ sind im Deutschen hochfrequent und unterliegen bisweilen tiefgreifenden phonetischen Reduktionsprozessen. Diese können Realisierungsvarianten hervorbringen, die in der Rückschau auf mehr als eine lexematische Ursprungsform zurückführbar sind. Die vorliegende Studie untersucht mit [ˈzɐmɐ] einen besonders prägnanten Fall dieser Art anhand eines Perzeptionsexperimentes.
Song lyrics can be considered as a text genre that has features of both written and spoken discourse, and potentially provides extensive linguistic and cultural information to scientists from various disciplines. However, pop songs play a rather subordinate role in empirical language research so far - most likely due to the absence of scientifically valid and sustainable resources. The present paper introduces a multiply annotated corpus of German lyrics as a publicly available basis for multidisciplinary research. The resource contains three types of data for the investigation and evaluation of quite distinct phenomena: TEI-compliant song lyrics as primary data, linguistically and literary motivated annotations, and extralinguistic metadata. It promotes empirically/statistically grounded analyses of genre-specific features, systemic-structural correlations and tendencies in the texts of contemporary pop music. The corpus has been stratified into thematic and author-specific archives; the paper presents some basic descriptive statistics, as well as the public online frontend with its built-in evaluation forms and live visualisations.
This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).
This study examines asymmetries between so-called inherent and contextual categories in relation to the morphological complexity of the nominal and verbal inflectional domain of languages. The observations are traced back to the influence of adult L2 learning in scenarios of intense language contact. A method for a simple comparison of the amount of inherent versus contextual categories is proposed and applied to the German-based creole language Unserdeutsch (Rabaul Creole German) in comparison to its lexifier language. The same procedure will be applied to two further language pairs. The grammatical systems of Unserdeutsch and other contact languages display a noticeable asymmetry regarding their structural complexity. Analysing different kinds of evidence, the explanatory key factor seems to be the role of (adult) L2 acquisition in the history of a language, whereby languages with periods of widespread L2 acquisition tend to lose contextual features. This impression is reinforced by general tendencies in pidgin and creole languages. Beyond that, there seems to be a tendency for inherent categories to be more strongly associated with the verb, while contextual categories seem to be more strongly associated with the noun. This leads to an asymmetry in categorical complexity between the noun phrase and the verb phrase in languages that experienced periods of intense L2 learning.
Politische Grenzen haben nachweislich sowohl auf den Sprachgebrauch als auch auf die Sprachwahrnehmung einen großen Einfluss. Die vorliegende Arbeit analysiert für den die Länder Deutschland, Österreich und Italien übergreifenden bairischen Sprachraum, wie Sprecher/Hörer diesen räumlich (horizontal-areal) sowie hinsichtlich seines Verhaltensspektrums (vertikal-sozial) gliedern. Dabei werden die Wahrnehmungen sprachlicher und außersprachlicher Merkmale und die Einstellungen dazu genauer betrachtet.
Mithilfe eines pluridimensionalen Erhebungssettings, bestehend aus Tiefeninterview, Online-Fragebogen, Mental-Map-Erhebung und Hörerurteilstest, kann gezeigt werden, dass extralinguistische Barrieren, wie etwa politische Grenzen, stark mit attitudinal-perzeptiven Grenzen korrelieren. Damit stellt im Bewusstsein der Befragten auch die Staatsgrenze zwischen Deutschland und Österreich eine Sprachgrenze dar.
This chapter focuses on the formation of adverbs from a corpuslinguistic perspective, providing an overview of adverb formation patterns in German that includes frequencies and hints to productivity as well as combining quantitative methods and theoretically founded hypotheses to address questions that concern possible grammaticalization paths in domains that are formally marked by prepositional elements or inflectional morphology (in particular, superlative or superlative-derived forms). Within our collection of adverb types from the project corpus, special attention is paid to adverbs built from primary prepositions. The data suggest that generally, such adverb formation involves the saturation of the internal argument slot of the relation-denoting preposition. In morphologically regular formations with the preposition in final position, pronominal forms like da ‘there’, hier ‘here’, wo ‘where’ as well as hin ‘hither’ and her ‘thither’ serve to derive adverbs. On the other hand, morphologically irregular formations with the preposition – in particular: zu ‘to’ or vor ‘before, in front of’ – in initial posi-tion show traits of syntactic origin such as (remnants of) inflectional morphology. The pertaining adverb type dominantly saturates the internal argument slot by means of universal quantification that is part and parcel as well of the derivation of superlatives and demonstrably fuels the productivity of the pertaining formation pattern.
„Bausteine einer Korpusgrammatik des Deutschen“ ist eine neue Schriftenreihe, die am Leibniz-Institut für Deutsche Sprache in Mannheim (IDS) entsteht. Sie setzt sich zum Ziel, mit korpuslinguistischen Methoden die Vielfalt und Variabilität der deutschen Grammatik in großer Detailschärfe zu erfassen und gleichzeitig für die Validierbarkeit der Ergebnisse zu sorgen. Die erste Ausgabe enthält eine Einführung in die Reihe sowie vier als Kapitel einer neuen Grammatik gestaltete Texte: 1. Grundlegende Aspekte der Wortbildung, 2. Bau von und Umbau zu Adverbien, 3. Starke vs. schwache Flexion aufeinanderfolgender attributiver Adjektive und 4. Reihenfolge attributiver Adjektive. Die Ausgabe ist mit einer interaktiven Datenbank zu attributiven Adjektiven verknüpft.
CLARIN contractual framework for sharing language data: the perspective of personal data protection
(2020)
The article analyses the responsibility for ensuring compliance with the General Data Protection Regulation (GDPR) in research settings. As a general rule, organisations are considered the data controller (responsible party for the GDPR compliance). Research constitutes a unique setting influenced by academic freedom. This raises the question of whether academics could be considered the controller as well. However, there are some court cases and policy documents on this issue. It is not settled yet. The analysis serves a preliminary analytical background for redesigning CLARIN contractual framework for sharing data.
We present web services which implement a workflow for transcripts of spoken language following the TEI guidelines, in particular ISO 24624:2016 “Language resource management – Transcription of spoken language”. The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
In this Paper, we describe a schema and models which have been developed for the representation of corpora of computer-mediated communicatin (CMC corpora) using the representation framework provided by the Text Encoding Initiative (TEI). We characterise CMC discourse as dialogic, sequentially organised interchange between humans and point out that many features of CMC are not adequately handled by current corpus encoding schemas and tools. We formulate desiderata for a representation of CMC in encoding schemes and argue why the TEI is a suitable framework for the encoding of CMC corpora. We propose a model of basic CMC units (utterances, posts, and nonverbal activities) and the macro- and micro-level structures of interactions in CMC environments. Based on these models, we introduce CMC-core, a TEI customisation for the encoding of CMC corpora, which defines CMC-specific encoding features on the four levels of elements, model classes, attribute classes, and modules of the TEI infrastructure. The description of our customisation is illustrated by encoding examples from corpora by researchers of the TEI SIG CMC, representing a variety of CMC genres, i.e. chat, wiki talk, twitter, blog, and Second Life interactions. The material described, i.e. schemata, encoding examples, and documentation, is available from the of the TEI CMC SIG Wiki and will accompany a feature request to the TEI council in late 2019.
Using video-recordings from one day of a theater project for young adults, this paper investigates how the meaning of novel verbal expressions is interactionally constituted and elaborated over the interactional history of a series of activities. We examine how the theater director introduces and instructs the group in the Chekhovian technique of acting, which is based on “imagining with the body,” and how the imaginary elements of the technique are “brought into existence” in the language of the instructions. By tracking shifts in the instructor’s use of the key expressions invisible/imaginary/inner body or movement through a series of exercises, we demonstrate how they are increasingly treated as real and perceivable bodily conduct. The analyses focus on the instructor’s attribution of factual and agentive properties to these expressions, and the changes that these properties undergo over the series of instructions. This case demonstrates the significance of longitudinal processes for the establishment of shared meaning in social interaction. The study thereby contributes to the field of interactional semantics and to longitudinal studies of social interaction.
Corona- und andere Partys
(2020)
The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.
cOWIDplus
(2020)
Die Corona-Krise hat Einfluss auf die Sprache in deutschsprachigen Online-Medien. Wir haben die Hypothese, dass sich die Vielfältigkeit des verwendeten Vokabulars einschränkt. Wir glauben zudem, dass sich die Diversität des Vokabulars nach "überstandener" Krise wieder auf ein "Prä-Pandemie-Niveau" einpendeln wird. Diese zweite Hypothese lässt sich erst im Laufe der Zeit überprüfen.
cOWIDplus Analyse ist eine kontinuierlich aktualisierte Ressource zu der Frage, ob und wie stark sich der Wortschatz ausgewählter deutscher Online-Pressemeldungen während der Corona-Pandemie systematisch einschränkt und ob bzw. wann sich das Vokabular nach der Krise wieder ausweitet. In diesem Artikel erläutern die Autor*innen die hinter der Ressource stehende Forschungsfrage, die zugrunde gelegten Daten, die Methode sowie die bisherigen Ergebnisse.
cOWIDplus Viewer
(2020)
Das "Verzeichnis grundlegender grammatischer Fachbegriffe" 2019. Anliegen, Konzeption, Perspektiven
(2020)
Sprachkämpfe gibt es so manche, aber wer hätte gedacht, dass ausgerechnet das Erscheinen der 28. Auflage des Rechtschreibdudens die Gemüter so in Wallung versetzen würde, dass gleich mehrere davon in die nächste Runde gehen. Verlag und Redaktion werden auf die sprachpolitische Bühne gezerrt, weil man die deutsche Sprache so gut für Zwecke identitärer Politik instrumentalisieren kann.
Der Einfluss extremistischer Gewaltereignisse auf das Framing von Extremismus in Online-Medien
(2020)
In diesem Beitrag untersuchen wir die Darstellung von Rechtsextremismus, Linksextremismus und Islamismus im medialen Diskurs am Beispiel von SPIEGEL Online, einem der deutschen Leitmedien. Wir leiten vier zentrale Dimensionen für die Konzeptualisierung von Extremismen ab: Ideologie und Organisation, Herkunft der Akteure, Stellung zur Gesellschaft und Typische Handlungen. Wir beobachten die Entwicklung der Darstellung der drei Extremismen an möglichen Bruchpunkten: Wir untersuchen das assoziative Framing der drei Extremismen vor und nach prominenten extremismusbezogenen Gewaltereignissen, namentlich die Anschläge des 11. September, die Veröffentlichung des NSU-Skandals und linksextremistische Aktivitäten während des G20-Gipfels in Hamburg. Mittels einer Kollokationsanalyse identifizieren wir mit den Extremismen assoziierte Aspekte und ordnen diese den Konzeptualisierungsdimensionen zu. Wir beobachten Veränderungen im Framing, die durch die ausgewählten Ereignisse bedingt sind, und vergleichen das resultierende Framing mit den Kerndefinitionen des Verfassungsschutzes aus dem Bericht des Jahres 2017, um mögliche Unterschiede in der Konzeptualisierung von Extremismen mit möglicherweise unterschiedlichen Handlungslogiken als Resultat divergierender Konzeptualisierungen herauszuarbeiten.
Im Beitrag steht das LeGeDe-Drittmittelprojekt und der im Laufe der Projektzeit entwickelte korpusbasierte lexikografische Prototyp zu Besonderheiten des gesprochenen Deutsch in der Interaktion im Zentrum der Betrachtung. Die Entwicklung einer lexikografischen Ressource dieser Art knüpft an die vielfältigen Erfahrungen in der Erstellung von korpusbasierten Onlinewörterbüchern (insbesondere am Leibniz-Institut für Deutsche Sprache, Mannheim) und an aktuelle Methoden der korpusbasierten Lexikologie sowie der Interaktionsanalyse an und nimmt als multimedialer Prototyp für die korpusbasierte lexikografische Behandlung von gesprochensprachlichen Phänomenen eine innovative Position in der modernen Onlinelexikografie ein. Der Beitrag befasst sich im Abschnitt zur LeGeDe-Projektpräsentation ausführlich mit projektrelevanten Forschungsfragen, Projektzielen, der empirischen Datengrundlage und empirisch erhobenen Erwartungshaltungen an eine Ressource zum gesprochenen Deutsch. Die Darstellung der komplexen Struktur des LeGeDe-Prototyps wird mit zahlreichen Beispielen illustriert. In Verbindung mit der zentralen Information zur Makro- und Mikrostruktur und den lexikografischen Umtexten werden die vielfältigen Vernetzungs- und Zugriffsstrukturen aufgezeigt. Ergänzend zum abschließenden Fazit liefert der Beitrag in einem Ausblick umfangreiche Vorschläge für die zukünftige lexikografische Arbeit mit gesprochensprachlichen Korpusdaten.
Die Sprachpolitik der AfD
(2020)
Sprachpolitik hat sich in den letzten Jahren als ein lohnendes Politikfeld etabliert. Im Umfeld der AfD und in der parlamentarischen Repräsentanz der Partei werden durch Aufrufe, Anträge, Anfragen und Gesetzesinitiativen verschiedene Themen adressiert, die schon im AfD-Grundsatzprogramm von 2016 gesetzt wurden. Um was für sprachpolitische Positionen handelt es sich, und was ist der Grund für das Interesse an diesen Themen?
Editorial
(2020)
Das Kommunizieren in Sozialen Medien und der Umgang mit Hypertexten ist im Jahr 2020 kein Randphänomen mehr. Die sprachlichen Besonderheiten internetbasierter Kommunikation und Sozialer Medien sind mittlerweile auch gut erforscht und beschrieben, allerdings werden diese bislang in deutschen Grammatiken, mit Ausnahme von Hoffmann (2014), allenfalls am Rande behandelt. Selbst neuere Ansätze zur Textanalyse, z. B. Ágel (2017), konzentrieren sich auf gestaltstabile, linear organisierte Schrifttexte. Dasselbe gilt für Ansätze, die primär für die Bewertung von Schreibprodukten in Bildungskontexten entwickelt wurden.
Einleitung
(2020)
A corpus-based academic grammar of German is an enormous undertaking, especially if it aims at using state-of-the-art methodology while ensuring that its study results are verifiable. The Bausteine-series, which is being developed at the Leibniz Institute for the German Language (IDS), presents individual “building blocks” for such a grammar. In addition to the peer-reviewed texts, the series publishes the results of statistical analyses and, for selected topics, the underlying data sets.
Esipuhe/Preface
(2020)
We evaluate a graph-based dependency parser on DeReKo, a large corpus of contemporary German. The dependency parser is trained on the German dataset from the SPMRL 2014 Shared Task which contains text from the news domain, whereas DeReKo also covers other domains including fiction, science, and technology. To avoid the need for costly manual annotation of the corpus, we use the parser’s probability estimates for unlabeled and labeled attachment as main evaluation criterion. We show that these probability estimates are highly correlated with the actual attachment scores on a manually annotated test set. On this basis, we compare estimated parsing scores for the individual domains in DeReKo, and show that the scores decrease with increasing distance of a domain to the training corpus.
This paper presents the QUEST project and describes concepts and tools that are being developed within its framework. The goal of the project is to establish quality criteria and curation criteria for annotated audiovisual language data. Building on existing resources developed by the participating institutions earlier, QUEST develops tools that could be used to facilitate and verify adherence to these criteria. An important focus of the project is making these tools accessible for researchers without substantial technical background and helping them produce high-quality data. The main tools we intend to provide are the depositors’ questionnaire and automatic quality assurance, both developed as web applications. They are accompanied by a Knowledge base, which will contain recommendations and descriptions of best practices established in the course of the project. Conceptually, we split linguistic data into three resource classes (data deposits, collections and corpora). The class of a resource defines the strictness of the quality assurance it should undergo. This division is introduced so that too strict quality criteria do not prevent researchers from depositing their data.
The present article shows an experimental subject investigation on elements of video telephony in relation to experiencing and feeling connectedness and intimacy within private interpersonal communication. Particular interests are questions about possible relationships between image detail, angle of view or perspective as well as image format or the foreign and personal perception of the communicators. Central to this is the question of whether the practices and interactions of users in dealing with communication technology can be used to derive possible conclusions on negotiation measures or even adaptation services. The obtained results are presented on the basis of an introductory theoretical discussion. It is followed by a summary and analysis as well as an outlook on the further use and significance of the results.
Designed as a contribution to contrastive linguistics, the present volume brings up-to-date the comparison of German with its closest neighbour, Dutch, and other Germanic relatives like English, Afrikaans, and the Scandinavian languages. It takes its inspiration from the idea of a "Germanic Sandwich", i.e. the hypothesis that sets of genetically related languages diverge in systematic ways in diverse domains of the linguistic system. Its contributions set out to test this approach against new phenomena or data from synchronic, diachronic and, for the first time in a Sandwich-related volume, psycholinguistic perspectives. With topics ranging from nickname formation to the IPP (aka 'Ersatzinfinitiv'), from the grammaticalisation of the definite article to /s/-retraction, and from the role of verb-second order in the acquisition of L2 English to the psycholinguistics of gender, the volume appeals to students and specialists in modern and historical linguistics, psycholinguistics, translation studies, language pedagogy and cognitive science, providing a wealth of fresh insights into the relationships of German with its closest relatives while highlighting the potential inherent in the integration of different methodological traditions.
This chapter begins with a sketch of the specifics of our approach, an overview of the contents of the chapters on word formation and some methodological notes. It then discusses the general characteristics of word formations and of their overall inventory, comparing word formations to primary words. Furthermore, the chapter explores the relative frequencies of word formations in different vocabulary areas and traces the word formation profiles of individual parts of speech. Finally, it compiles the characteristic word formation rules for different parts of speech.
Coaching outcome research convincingly argues that coaching is effective and facilitates change in clients. While coaching practice literature depicts questions as key vehicle for such change, empirical findings as regards the local and global change potential of questions are so far largely missing in both (psychological) outcome research and (linguistic and psychological) process research on coaching. The local change potential of questions refers to a turn-by-turn transformation as a result of their sequentiality, the global change potential is related to the power of questions to initiate, process and finalize established phases of change. This programmatic article on questions, or rather questioning sequences, in executive coaching pursues two goals: firstly, it takes stock of available insights into questions in coaching and advocates for Conversation Analysis as a fruitful methodological framework to assess the local change potential of questioning sequences. Secondly, it points to the limitations of a local turn-by-turn approach to unravel the overall change potential of questions and calls for an interdisciplinary approach to bring both local and global effectiveness into relation. Such an approach is premised on conversational sequentiality and psychological theories of change and facilitates research on questioning sequences as both local and global agents of change across the continuum of coaching sessions. We present the TSPP Model as a first result of such an interdisciplinary cooperation.
In diesem Beitrag stellen wir die Ergebnisse einer Studie über die Intonation von Frageaktivitäten in deutschen Alltagsgesprächen vor. Unsere Untersuchung erforscht, inwieweit die Intonation zur Kontextualisierung von konversationellen Fragen beiträgt. In der Analyse stützen wir uns auf das autosegmental-metrische Modell von Peters und das taxonomische Modell der interaktionalen Prosodieforschung von Selting. Diese Modelle beschreiben jeweils phonologische oder pragmatische Aspekte der Frageintonation, zwei Dimensionen, die für sich genommen, keine vollständige Beschreibung liefern können. Auf der Grundlage authentischer Gesprächsdaten aus dem Korpus FOLK argumentieren wir für die Kompatibilität des autosegmental-metrischen Modells von Peters und des taxonomischen Modells der Frageintonation von Selting. Die Merkmale aus beiden Modellen lassen sich zu Bündeln kombinieren, die es erlauben, die Intonation von Fragen zu erfassen.
I’ve got a construction looks funny – representing and recovering non-standard constructions in UD
(2020)
The UD framework defines guidelines for a crosslingual syntactic analysis in the framework of dependency grammar, with the aim of providing a consistent treatment across languages that not only supports multilingual NLP applications but also facilitates typological studies. Until now, the UD framework has mostly focussed on bilexical grammatical relations. In the paper, we propose to add a constructional perspective and discuss several examples of spoken-language constructions that occur in multiple languages and challenge the current use of basic and enhanced UD relations. The examples include cases where the surface relations are deceptive, and syntactic amalgams that either involve unconnected subtrees or structures with multiply-headed dependents. We argue that a unified treatment of constructions across languages will increase the consistency of the UD annotations and thus the quality of the treebanks for linguistic analysis.
Journal for language technology and computational linguistics. Special Issue on offensive language
(2020)
Recent years have seen a sharp increase in studies of offensive language (and related notions such as abusive language, hate speech, verbal aggression etc.) as well as of patterns of online behavior such as cyberbullying and trolling. Multiple efforts have been launched for the exploration of computational approaches and the establishment of benchmark datasets for various languages (Basile et al. (2019), Wiegand et al. (2018), Zampieri et al. (2019)).
Heute wird mehr geschrieben als je zuvor und die digitale Kommunikation trägt wesentlich dazu bei; ein großer Teil des heutigen Schreibens ist dialogisches Schreiben im Alltag. Konsequenterweise wird die Online-Kommunikation zunehmend Thema in Bildungskontexten und in der Deutschdidaktik. Offen ist aber weiterhin, wie Texte des interaktionsorientierten Schreibens bewertet werden sollen, die sich von solchen des textorientierten Schreibens in vielerlei Hinsicht unterscheiden können. Während es für textorientiertes Schreiben Normen gibt, die in Sprachkodizes erfasst sind, ist es nicht klar, was der Bezugspunkt für interaktionsorientierte Texte sein könnte. In diesem Beitrag analysieren wir die Verwendung von Konnektoren in der Online-Kommunikation und die Repräsentation von online-spezifischen Besonderheiten in Sprachressourcen. Die Ergebnisse zeigen, dass spezifische Online-Verwendungsweisen von Konnektoren in Sprachkodizes kaum berücksichtigt und beschrieben werden.
EFNIL, the European Federation of National Institutions for Language, promotes the standard languages and the linguistic diversity of the European countries as an essential characteristic of their cultural diversity and wealth. The 17th annual conference of EFNIL in Tallinn dealt with the relation between language and economy.
• Language politics often have economic intentions, the language use of the individual is embedded in economic conditions, languages seem to differ in their economic value. In recent years, economists and sociolinguists have developed models of describing these interdependencies.
• The interaction in multilingual settings needs professional handling. There are traditional instances such as language teaching or translation and new professional fields of the digital age such as multilingual databases. Lots of economic needs and opportunities appear in this field.
• Digitization and societal diversity are two elements leading to more successful interaction, assisted by the use of automatic everyday translation, the development of plain language etc.
This volume presents an extensive overview of the interplay of language and economy.
This article examines the language contact situation as well as the language attitudes of the Caucasian Germans, descendants of German-born inhabitants of the Russian Empire and the Soviet Union who emigrated in 1816/17 to areas of Transcaucasia. After deportations and migrations, the group of Caucasian Germans now consists of those who have since emigrated to Germany and those who still live in the South Caucasus. It’s the first time that sociolinguistic methods have been used to record data from the generation who experienced living in the South Caucasus and in Germany as well as from two succeeding generations. Initial results will be presented below with a focus on the language contact constellations of German varieties as well as on consequences of language contact and language repression, which both affect language attitudes.
Das vorliegende "Verzeichnis grundlegender grammatischer Fachausdrücke" beruht auf einem Konsens, den das "Gremium für Schulgrammatische Terminologie" unter Berücksichtigung fachwissenschaftlicher, fachdidaktischer und unterrichtspraktischer Gesichtspunkte hergestellt hat. Ziel dieses Verzeichnisses ist es, Anhaltspunkte zu geben für die Konzeption von Lehrplänen und Schulbüchern für das Fach Deutsch. Das Verzeichnis bietet eine Grundlage zur Vereinheitlichung der Termini sowie des mit einem Terminus verbundenen Begriffsverständnisses.
In this paper we investigate the problem of grammar inference from a different perspective. The common approach is to try to infer a grammar directly from example sentences, which either requires a large training set or suffers from bad accuracy. We instead view it as a problem of grammar restriction or sub-grammar extraction. We start from a large-scale resource grammar and a small number of examples, and find a sub-grammar that still covers all the examples. To do this we formulate the problem as a constraint satisfaction problem, and use an existing constraint solver to find the optimal grammar. We have made experiments with English, Finnish, German, Swedish and Spanish, which show that 10–20 examples are often sufficient to learn an interesting domain grammar. Possible applications include computer-assisted language learning, domain-specific dialogue systems, computer games, Q/A-systems, and others.
This thesis describes work in three areas: grammar engineering, computer-assisted language learning and grammar learning. These three parts are connected by the concept of a grammar-based language learning application. Two types of grammars are of concern. The first we call resource grammars, extensive descriptions a natural languages. Part I focuses on this kind of grammars. The other are domain-specific or application-specific grammars. These grammars only describe a fragment of natural language that is determined by the domain of a certain application. Domain-specific grammars are relevant for Part II and Part III. Another important distinction is between humans learning a new natural language using computational grammars (Part II) and computers learning grammars from example sentences (Part III). Part I of this thesis focuses on grammar engineering and grammar testing. It describes the development and evaluation of a computational resource grammar for Latin. Latin is known for its rich morphology and free word order, both have to be handled in a computationally efficient way. A special focus is on methods how computational grammars can be evaluated using corpus data. Such an evaluation is presented for the Latin resource grammar. Part II, the central part, describes a computer-assisted language learning application based on domain-specific grammars. The language learning application demonstrates how computational grammars can be used to guide the user input and how language learning exercises can be modeled as grammars. This allows us to put computational grammars in the center of the design of language learning exercises used to help humans learn new languages. Part III, the final part, is dedicated to a method to learn domain- or application-specific grammars based on a wide-coverage grammar and small sets of example sentences. Here a computer is learning a grammar for a fragment of a natural language from example sentences, potentially without any additional human intervention. These learned grammars can be based e.g. on the Latin resource grammar described in Part II and used as domain-specific lesson grammars in the language learning application described Part II.
Providing online repositories for language resources is one of the main activities of CLARIN centres. The legal framework regarding liability of Service Providers for content uploaded by their users has recently been modified by the new Directive on Copyright in the Digital Single Market. A new category of Service Providers, Online Content-Sharing Service Providers (OCSSPs), was added. It is subject to a complex and strict framework, including the requirement to obtain licenses from rightholders for the hosted content. This paper provides the background and effect of these changes to law and aims to initiate a debate on how CLARIN repositories should navigate this new legal landscape.
Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
(2020)
We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of “scientific language” and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.
Maske oder Mundschutz?
(2020)
In the present article we argue that all communication is medial in the sense that every human sign-based interaction is shaped by medial aspects from the outset. We propose a dynamic, semiotic concept of media that focuses on the process-related aspect of mediality, and we test the applicability of this concept using as an example the second presidential debate between Clinton and Trump in 2016. The analysis shows in detail how the sign processing during the debate is continuously shaped by structural aspects of television and specific traits of political communication in television. This includes how the camerawork creates meaning and how the protagonists both use the affordances of this special mediality. Therefore, it is not adequate in our view to separate the technical aspects of the medium, the ‘hardware’, from the processual aspects and the structural conditions of communication. While some aspects of the interaction are directly constituted by the medium, others are more indirectly shaped and influenced by it, especially by its institutional dimension – we understand them as second-order media effects. The whole medial procedure with its specific mediality is a necessary, but not a sufficient condition of meaning-making. We distinguish the medial procedure from the semiotic modes employed, the language games played and the competence of the players involved.
The theme of the AFinLA 2020 Yearbook Methodological turns in applied language studies is discussed in this introductory article from three interrelated perspectives, variously addressed in the three plenary presentations at the AFinLA Autumn Symposium 2019 as well as in the thirteen contributions to the yearbook. In the first set of articles presented, the authors examine the role and impact of technological development on the study of multimodal digital and non-digital contexts and discourses and ensuing new methods. The second set of studies in the yearbook revisits issues of language proficiency, critically discussing relevant concepts and approaches. The third set of articles explores participation and participatory research approaches, reflecting on the roles of the researcher and the researched community.
Im vorliegenden Beitrag gehen wir von der Prämisse aus, dass die Angemessenheit sprachlicher Formen nicht pauschal, sondern anhand des jeweiligen Kontexts zu beurteilen ist. Anhand einer Online-Fragebogenstudie mit durch weil eingeleiteten Nebensätzen untersuchen wir die Hypothese, dass Varianten, die nicht dem Schriftstandard entsprechen, in Kommunikationsformen, die sich weniger an standard- und schriftsprachlichen Normen orientieren, als (mindestens) ebenso angemessen oder zumindest unterschiedlich wahrgenommen werden wie eine schriftstandardsprachliche Variante. Wir untersuchen dies anhand von drei Aufgaben: Rezeption, Produktion und Assoziation zu bestimmten Medien und Textsorten. Wir können zeigen, dass die schriftnormgerechte Variante durchweg als am akzeptabelsten eingeschätzt wird. In allen drei Aufgaben finden sich aber auch eindeutige und übereinstimmende Effekte, die nahelegen, dass die verschiedenen Varianten in Abhängigkeit der Textsorte doch unterschiedlich eingeschätzt, produziert und assoziiert werden.
In diesem Beitrag werden exemplarisch verschiedene potenzielle Gebrauchsmuster mit dem deutschen Lemma wissen gesammelt und ihre in der Fachliteratur vorgelegten interaktionslinguistisch-funktionalen Beschreibungen für einen Strukturierungsversuch genutzt. Im Zentrum steht ein multifunktionaler handlungsorientierter Ansatz zur Beschreibung von Interaktion im Gespräch. Der Beitrag greift dabei Überlegungen auf, die im Rahmen des Forschungsprojekts Lexik des gesprochenen Deutsch (= LeGeDe) zur Erstellung einer korpusbasierten lexikogra- fischen Ressource lexikalischer Besonderheiten des gesprochenen Deutsch in der Interaktion thematisiert wurden.
Schlüsselwörter: Muster, Lexik des gesprochenen Deutsch, Interaktion, Internetlexikografie
Nachruf auf Helmut Frosch
(2020)
Nachruf auf Ulrich Engel
(2020)
The majority of new words in dictionaries are included following a certain period of time during which they have become more frequent in use and established morphosyntactic and orthographic features consistent with the language system they are borrowed into. In case of borrowed new words, inclusion often takes place at a transitional state of assimilation to the language system, where delayed orthographic or phonetic change cannot be ruled out and the differentiation between standard-conforming and non-standard orthographic word forms of a lemma oftentimes depends on the proximity between the writing systems of the donor and the recipient language. Following a brief overview of loan words and their lexicographical description in the Neologismenwörterbuch, a specialized online dictionary for neologisms in contemporary German, this paper presents findings of an investigative case study on dictionary entries for a neologism borrowed from a logographic language system and discusses the potential of a corpus-based description of new loan words.
To ensure short gaps between turns in conversation, next speakers regularly start planning their utterance in overlap with the incoming turn. Three experiments investigate which stages of utterance planning are executed in overlap. E1 establishes effects of associative and phonological relatedness of pictures and words in a switch-task from picture naming to lexical decision. E2 focuses on effects of phonological relatedness and investigates potential shifts in the time-course of production planning during background speech. E3 required participants to verbally answer questions as a base task. In critical trials, however, participants switched to visual lexical decision just after they began planning their answer. The task-switch was time-locked to participants' gaze for response planning. Results show that word form encoding is done as early as possible and not postponed until the end of the incoming turn. Hence, planning a response during the incoming turn is executed at least until word form activation.
Politiker und Parteien sehen sich heutzutage oft mit dem Vorwurf konfrontiert, sie heben sich kaum mehr voneinander ab, seien gar „austauschbar“. Umso größer scheint das Bedürfnis nach Abgrenzung. Diese wird kommunikativ hergestellt und ist am besten von den diskursiven Zusammenhängen und Akteurskonstellationen her, in denen sie sich aktualisiert, nachzuvollziehen.
Das Vorgehen in dieser Arbeit gliedert sich im Wesentlichen in drei Schritte: Zunächst wird eine Theorieskizze der Abgrenzung als Sprechhandlung entworfen. Hierbei geht es vor allem darum, verschiedene Lesarten zu erschließen und die Abgrenzung in einem Panorama verwandter Konzepte wie etwa Ausgrenzung, Distinktion und Distanzierung zu verorten (Teil 1). Daraufhin wird die Plenardebatte als Textsorte erschlossen und in ihren kommunikativen Spezifika erfasst, wobei besonders die Stichworte Inszeniertheit, Mehrfachadressierung und die Frage nach dem Verhältnis zwischen Mündlichkeit und Schriftlichkeit in den Blickpunkt rücken (Teil 2). Sodann wird mithilfe der pragma-semiotischen Textarbeit als Methode ganz konkret sprachliches Datenmaterial aus Plenardebatten analysiert und interpretativ ausgewertet (Teile 3 und 4). Dabei kommen auch korpuslinguistische Verfahren zum Einsatz, die jedoch letztlich im Dienste einer qualitativ orientierten Analyse stehen.
Die Analyse berücksichtigt sowohl explizite als auch implizite Formen sprachlicher Abgrenzung. Sie zeigt unter anderem, dass politische Abgrenzungshandlungen keineswegs parteispezifisch sind, sondern von allen Parteien und Akteuren mehr oder weniger konstant praktiziert werden. Dabei wird Abgrenzung hauptsächlich als Selbstpositionierung realisiert; bisweilen finden sich aber durchaus auch Fremdpositionierungen – etwa als Aufforderungen an andere Akteure, sich gegenüber Dritten abzugrenzen. Auf der Ebene der sprachlichen Formen lässt sich schließlich durch eine Art experimentelle Annäherung mit korpuslinguistischen Verfahren eine Reihe von Mehrworteinheiten ausmachen, die als Indikatoren für implizite Abgrenzung gelten können.
Der Band leistet eine theoretisch begründete und empirisch validierte Entwicklung einer automatisierten Wortartenannotation (Part-of-Speech-Tagging) für Transkripte spontansprachlicher Daten des Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK), das über die Datenbank für Gesprochenes Deutsch der Forschungsgemeinschaft öffentlich zugänglich ist. Dabei setzt er zwei Schwerpunkte: erstens die theoretische Aufarbeitung von Unterschieden von Transkripten gesprochener Sprache zu schriftsprachlichen Daten in Hinblick auf die Entwicklung eines Tagsets für das gesprochene Deutsch; zweitens die Darstellung der empirischen Arbeitsschritte zur Erstellung des automatisierten Part-of-Speech-Taggings, d. h. die Implementierung und Evaluierung für die Annotation des FOLK-Korpus. Der Band ist eine kritische Reflexion der Wortartentheorien im Spannungsfeld zwischen Theorie und datengeleiteter Arbeit. Er gibt Einblicke über die Korpusaufbereitung von Transkripten gesprochener Sprache und stellt diese in Bezug zu Theorien über die Eigenheiten gesprochener Sprache.
According to Positioning Theory, participants in narrative interaction can position themselves on a representational level concerning the autobiographical, told self, and a performative level concerning the interactive and emotional self of the tellers. The performative self is usually much harder to pin down, because it is a non-propositional, enacted self. In contrast to everyday interaction, psychotherapists regularly topicalize the performative self explicitly. In our paper, we study how therapists respond to clients' narratives by interpretations of the client's conduct, shifting from the autobiographical identity of the told self, which is the focus of the client's story, to the present performative self of the client. Drawing on video recordings from three psychodynamic therapies (tiefenpsychologisch fundierte Psychotherapie) with 25 sessions each, we will analyze in detail five extracts of therapists' shifts from the representational to the performative self. We highlight four findings:
• Whereas, clients' narratives often serve to support identity claims in terms of personal psychological and moral characteristics, therapists rather tend to focus on clients' feelings, motives, current behavior, and ways of interacting.
• In response to clients' stories, therapists first show empathy and confirm clients' accounts, before shifting to clients' performative self.
• Therapists ground the shift to clients' performative self by references to clients' observable behavior.
• Therapists do not simply expect affiliation with their views on clients' performative self. Rather, they use such shifts to promote the clients' self-exploration. Yet, if clients resist to explore their selves in more detail, therapists more explicitly ascribe motives and feelings that clients do not seem to be aware of. The shift in positioning levels thus seems to have a preparatory function for engendering therapeutic insights.