Refine
Year of publication
- 2021 (205) (remove)
Document Type
- Article (92)
- Conference Proceeding (29)
- Part of a Book (27)
- Other (20)
- Book (11)
- Report (11)
- Part of Periodical (8)
- Review (3)
- Working Paper (2)
- Course Material (1)
Keywords
- Deutsch (77)
- Korpus <Linguistik> (43)
- Interaktion (25)
- Konversationsanalyse (25)
- Kommunikation (22)
- Grammatik (19)
- Sprachgebrauch (16)
- Sprachpolitik (16)
- Forschungsdaten (15)
- COVID-19 (13)
Publicationstate
- Veröffentlichungsversion (205) (remove)
Reviewstate
- Peer-Review (93)
- (Verlags)-Lektorat (71)
Publisher
Verbs may be attributed to higher agency than other grammatical categories. In Study 1, we confirmed this hypothesis with archival datasets comprising verbs (N = 950) and adjectives (N = 2115). We then investigated whether verbs (vs. adjectives) increase message effectiveness. In three experiments presenting potential NGOs (Studies 2 and 3) or corporate campaigns (Study 4) in verb or adjective form, we demonstrate the hypothesized relationship. Across studies, (overall N = 721) grammatical agency consistently increased message effectiveness. Semantic agency varied across contexts by either increasing (Study 2), not affecting (Study 3), or decreasing (Study 4) the effectiveness of the message. Overall, experiments provide insights in to the meta-semantic effects of verbs – demonstrating how grammar may influence communication outcomes.
With recourse to a broader understanding of the concept of translation, the transfer of source texts in one variety into another variety of the same language can also be called translation. This paper focuses on the target language – or rather – the target variety “easy-to-read language”, which is meant to make texts comprehensible for people with communication limitations. Considering its origins in the disability rights movement, the aim is to inform affected persons about their rights and democratic processes, i.e. to translate especially legal texts into the so-called easy-to-read language. Although there is a whole range of rules and guidelines for formulating in easy-to-read language, ”none offers a sufficient approach for translation into easy-to-read language“ (Bredel & Maaß, 2016a, p. 109). Standardization of the variety is also still a long way off. On the one hand, the contribution takes stock of legal regulations in easy-to-read language. On the other hand, four versions of the Federal Participation Law in easy-to-read language are analysed with regard to their external features and the constructions used to explain technical terminology. The analysis shows that legal texts in easy-to-read language are (still) quite limited in number and are also difficult to find. Concerning the second part, the constructions used exhibit a great structural variance, both intra- and intertextually. It is therefore questionable whether the addressees can access the texts independently. Also, it is still necessary to make the rules, the formulations of the rules and the implementations clearer so that the translations fulfil their function.
Zum Geleit
(2021)
Neben den wissenschaftlichen Aufsätzen, die nach den Qualitätskriterien
der heute üblichen doppelt anonymen Begutachtung ausgewählt wurden, enthält das Heft drei Berichte – zu einer Tagung zur Mehrsprachigkeit in Tartu, zu einem interdisziplinären DaF-Projekt in Tallinn sowie zu einer Forschungsgruppe zu Sprachkompetenzen und Deutschlernmotivationen von Student/innen in den baltischen und nordischen Ländern. Das Heft wird schließlich durch zwei Rezensionen abgerundet.
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.
This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.
This article explores the relation between word order and response latency, focusing on responses to question-word questions. Qualitative (multimodal) and quantitative analyses of naturally occurring conversations in French—where question-words can occur in initial, medial, or final position within the question—show that variation in word order affects the timing of responses. It is argued that this is so because word order provides a differential basis for action ascription, creating different temporal opportunities for projecting the recipient’s next relevant action. The frequent occurrence of early responses to questions with an initial question-word, in particular, stresses the importance of the recognition point of an action under way for response timing and shows respondents’ pervasive orientation to sequential progressivity. Findings highlight how lexico-syntactic trajectories of emergent turns, prior talk and actions, material and bodily features of interaction, and participants’ shared expectations conspire in shaping the time-courses of action ascription and action projection.
Leicht hat es die Duden-Redaktion derzeit nicht. Im Sommer erst musste sie sich ungerechtfertigterweise vorhalten lassen, mit der Aufnahme neuer Wörter in die 28. Auflage des Rechtschreibdudens eine links-grüne Agenda zu verfolgen. Vor kurzem hieß es nun, im Online-Duden werde heimlich eine Sprachveränderung betrieben, die zum Verschwinden des generischen Maskulinums führe. Kürzlich hat deshalb der “Verein Deutsche Sprache”, jener umstrittene Verein konservativer Sprachschützer*innen, sogar einen öffentlichen Aufruf gegen den Dudenverlag gestartet. Was ist also dran an diesem Vorwurf?
Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates
(2021)
This paper investigates the use of first person plural pronouns as a rhetorical device in political speeches. We present an annotation schema for disambiguating pronoun references and use our schema to create an annotated corpus of debates from the German Bundestag. We then use our corpus to learn to automatically resolve pronoun referents in parliamentary debates. We explore the use of data augmentation with weak supervision to further expand our corpus and report preliminary results.
Research on multimodal interaction has shown that simultaneity of embodied behavior and talk is constitutive for social action. In this study, we demonstrate different temporal relationships between verbal and embodied actions. We focus on uses of German darf/kann ich? (“may/can I?”) in which speakers initiate, or even complete the embodied action that is addressed by the turn before the recipient’s response. We argue that through such embodied conduct, the speaker bodily enacts high agency, which is at odds with the low deontic stance they express through their darf/kann ich?-TCUs. In doing so, speakers presuppose that the intersubjective permissibility of the action is highly probable or even certain. Moreover, we demonstrate how the speaker’s embodied action, joint perceptual salience of referents, and the projectability of the action addressed with darf/kann ich? allow for a lean syntactic design of darf/kann ich?-TCUs (i.e., pronominalization, object omission, and main verb omission). Our findings underscore the reflexive relationship between lean syntax, sequential organization and multimodal conduct.
N-grams are of utmost importance for modern linguistics and language technology. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of press publishers) also provide interesting arguments in this debate. The paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.
Wenn ich am Ende dieses Jahres an die Diskussionen zur deutschen Sprache zurückdenke, die ich bei Medienauftritten und in Veranstaltungen geführt habe, dann ist dabei immer wieder eine ganz bestimmte Frage gestellt worden: Wer entscheidet eigentlich darüber, wie wir sprechen und schreiben, was wir sagen dürfen und was nicht? Wer hat die Entscheidungsbefugnis über die Aufnahme neuer Wörter ins Deutsche, über gendergerechte Sprache oder über Rechtschreibregeln?
Weniger ist mehr! Die IDS-Goethe-Studie in den Integrationskursen und Vorschläge für die Praxis
(2021)
Öffentliche Sprachdiskurse, wie sie beispielsweise in den Medien stattfinden, werden typischerweise aus einer sprachkritischen Haltung heraus geführt. Inwieweit diese veröffentlichte Meinung tatsächlich die Mehrheitsmeinung der Sprecherinnen und Sprecher widerspiegelt, ist durchaus eine offene Frage. In diesem Beitrag berichten wir aus einer rezenten Erhebung über Spracheinstellungen in Deutschland. Wir zeigen, dass die Art der Frageformulierung einen starken Einfluss auf die Ergebnisse hat, und berichten, welche sprachlichen Veränderungen die Befragten in jüngerer Zeit angeben, wahrgenommen zu haben.
Bislang gibt es keine akkuraten, repräsentativen Statistiken dazu, welche Sprachen in Deutschland gesprochen werden. Zwar wird in verschiedenen Erhebungen nach Muttersprachen oder nach zuhause gesprochenen Sprachen gefragt; aufgrund einiger Mängel im Erhebungsdesign bilden die Ergebnisse der vorliegenden Erhebungen jedoch die sprachliche Realität der in Deutschland lebenden Bevölkerung nicht angemessen ab. Im Beitrag wird anhand von drei Erhebungen gezeigt, dass bereits die Instrumente zur Erhebung von Sprache von Spracheinstellungen geprägt sind und dass dadurch die Gültigkeit der Ergebnisse stark eingeschränkt wird. Diese Mängel gelten für Sprachstatistiken im Hinblick auf die gesamte Bevölkerung Deutschlands – Kinder und Jugendliche eingeschlossen.
Kontroversen wie die um gendergerechten Sprachgebrauch haben eindeutig eine politische Dimension. Das ist aber nur die eine Seite der Medaille. Jenseits der politischen Auseinandersetzung stellt sich die Frage, in welcher Weise die verschiedenen Positionen in der Gesellschaft verankert sind und warum die Kontroversen überhaupt entstehen. Die Analyse der postindustriellen Gesellschaft des Soziologen Andreas Reckwitz bietet dafür die Möglichkeit einer Erklärung.
Vorwort
(2021)
Mit Entwicklungen in der Welt entsteht auch ein neuer Wortschatz, insbesondere in Zeiten großer gesellschaftlicher Umbrüche oder bedingt durch Krisen, denn neue Dinge, neue Umstände, »neue Normalitäten« müssen bezeichnet werden, damit darüber kommuniziert werden kann. Zugleich steigt die Gebrauchshäufigkeit älterer Wörter, weil sie aktuell für die Verständigung besonders relevant werden. Die in diesem Glossar präsentierten Begriffe thematisieren solche sprachlichen Auswirkungen der Coronakrise.
Vom ZISW zum ZAS
(2021)
Das Leibniz-Zentrum Allgemeine Sprachwissenschaft (ZAS) ist jetzt 25 Jahre alt, im besten Alter sozusagen. Es hat Erfahrungen gesammelt, sich mit theoretischen Forschungen zur Phonetik und Phonologie, Morphologie, Syntax, Semantik und Pragmatik weltweit einen Namen gemacht. Anlässlich seines Jubiläums fragt man sich, wo es seine Ursprünge hat und unter welchen Umständen es ›groß‹ geworden ist. Diesen beiden Fragen versuche ich in diesem Beitrag nachzugehen. Ich tue das, weil ich die Zeitzeugin bin, die das ZAS am längsten begleitet hat.
The German e-dictionary documenting confusables Paronyme – Dynamisch im Kontrast contains lexemes which are similar in sound, spelling and/or meaning, e.g. autoritär/autoritativ, innovativ/innovatorisch. These can cause uncertainty as to their appropriate use. The monolingual guide could be easily expanded to become a multilingual platform for commonly confused items by incorporating language modules. The value of this visionary resource is manifold. Firstly, e-dictionaries of confusables have not yet been compiled for most European languages; consequently, the German resource could serve as a model of practice. Secondly, it would be able to explain the usage of false friends. Thirdly, cognates and loan word equivalents would be offered for simultaneous consultation. Fourthly, users could find out whether, for example, a German pair is semantically equivalent to a pair in another language. Finally, it would inform users about cases where a pair of semantically similar words in one language has only one lexical counterpart in another language. This paper is an appeal for visionary projects and collaborative enterprises. I will outline the dictionary’s layout and contents as shown by its contrastive entries. I will demonstrate potential additions, which would make it possible to build up a large platform for easily misused words in different languages.
Validating the Performativity Hypothesis to Neg-Raising using corpus data: Evidence from Polish
(2021)
Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus
(2021)
Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are either available only for English or not available to the general public due to copyright issues. Nevertheless, there are some examples of freely available multilingual corpora for training Deep Learning NLP models, such as the OSCAR and Paracrawl corpora. However, they have quality issues, especially for low-resource languages. Moreover, recreating or updating these corpora is very complex. In this work, we try to reproduce and improve the goclassy pipeline used to create the OSCAR corpus. We propose a new pipeline that is faster, modular, parameterizable, and well documented. We use it to create a corpus similar to OSCAR but larger and based on recent data. Also, unlike OSCAR, the metadata information is at the document level. We release our pipeline under an open source license and publish the corpus under a research-only license.
Towards comprehensive definitions of data quality for audiovisual annotated language resources
(2021)
Though digital infrastructures such as CLARIN have been successfully established and now provide large collections of digital resources, the lack of widely accepted standards for data quality and documentation still makes re-use of research data a difficult endeavour, especially for more complex resource types. The article gives a detailed overview over relevant characteristics of audiovisual annotated language resources and reviews possible approaches to data quality in terms of their suitability for the current context. Conclusively, various strategies are suggested in order to arrive at comprehensive and adequate definitions of data quality for this specific resource type and possibly for digital language resources in general.
Am 24. Februar 2020 wurde in der Schweiz die erste Infektion mit dem Coronavirus nachgewiesen. Zu diesem Zeitpunkt konnte wohl noch niemand ahnen, welche tiefgreifenden Konsequenzen die Corona-Pandemie für die Gesellschaft haben wird. Aus heutiger Perspektive überrascht es uns nicht mehr, dass das Pandemiegeschehen auch starke Auswirkungen auf die Sprache hatte und noch immer hat, denn Sprachgebrauch passt sich stets gesellschaftlichen Veränderungen an. Am Leibniz-Institut für Deutsche Sprache in Mannheim dokumentieren und erforschen wir die ungewöhnlich starken und kurzfristigen Wirkungen der Pandemie auf die deutsche Sprache und fassen unsere Ergebnisse unter anderem in zahlreichen Beiträgen zusammen.
This study offers a contribution to the reception analysis of TV documentaries by focusing on viewer opinions expressed on social media. It analyses German and English comments from YouTube and Facebook in order to find out what aspects of documentaries the audience comments on. More specifically, it describes how the viewers evaluate strategies that the producers use for simplifying complex content while still creating an appealing and entertaining media product. The results imply that most viewers appreciate informative shows that are entertaining at the same time. They also show that viewers tend to focus on the music and image, rather than on the spoken text, and that documentaries where nature plays an important role are judged more positively than science and history documentaries.
Die Studie untersucht therapeutische Strategien für den Umgang mit und das Management von Patientenwiderstand, der auf Lösungsorientierte Fragen in der Psychotherapie folgt. Patienten reagieren auf Lösungsorientierte Fragen regelmäßig dispräferiert. Die Therapeuten wiederum sollen therapeutisch relevantes Material elizitieren.
Mit Hilfe linguistisch-gesprächsanalytischer Methoden wird untersucht, wie Therapeuten im Anschluss an lösungsorientierte Anfragen mit dispräferierten Antworten umgehen. Das Widerstandsmanagement der Therapeuten umfasst dabei sowohl expansions- und reparaturinitiierende Reaktionen als auch Themenwechsel.
Untersucht werden 15 psychodiagnostische Erstgespräche nach der erweiterten Version der Operationalisierten Psychodynamischen Diagnostik (OPD-2), einem standardisierten und manualisierten diagnostischen Inventar, das die psychodynamischen Kräfte hinter den Erkrankungen der Patienten erfassen soll.
This paper describes the TEI-based ISO standard 2462:2016 “Transcription of spoken language” and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability between these formats and with relevant tools and services. The main idea behind the paper is that a digital infrastructure providing language resources and services to researchers should also allow the combined use of resources and/or services from different contexts. This requires syntactic and semantic interoperability. We propose a solution based on the ISO/TEI format and describe the necessary steps for this format to work as an exchange format with basic semantic interoperability for spoken language resources across the CLARIN infrastructure and beyond.
In this article, we provide longitudinal evidence for the progressive routinization of a grammatical construction used for social coordination purposes in a highly specialized activity context: task-oriented video-mediated interactions. We focus on the methodic ways in which, over the course of 4 years, a second language speaker and initially novice to such interactions coordinates the transition between interacting with her coparticipants and consulting her own screen, which suspends talk, without creating trouble due to halts in progressivity. Initially drawing on diverse resources, she increasingly resorts to the use of a prospective alert constructed around the verb to check (e.g., “I will check”), which eventually routinizes in the lexically specific form “let me check” as a highly context- and activity-bound social action format. We discuss how such change over the participant’s video-mediated interactional history contributes to our understanding of social coordination in video-mediated interaction and of participants’ recalibrating their grammar-for-interaction while adapting to new situations, languages, or media. Data are in English.
Sometimes legal scholars get relevant but baffling questions from laypersons like: “The reference to a work is personal data, so does the GDPR actually require me to anonymise it? Or, as my voice data is personal data, does the GDPR automatically give me access to a speech recognizer using my voice sample? Or, can I say anything about myself without the GDPR requiring the web host to anonymise or remove the post? What can I say about others like politicians? And, what can researchers say about patients in a research report?” Based on these questions, the authors address the interaction of intellectual property and data protection law in the context of data minimisation and attribution rights, access rights, trade secret protection, and freedom of expression.
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
The paper explores factors that influence the distribution of constituent words of compounds over the head and modifier position. The empirical basis for the study is a large database of German compounds, annotated with respect to the morphological structure of the compound and the semantic category of the constituents. The study shows that the polysemy of the constituent word, its constituent family size, and its semantic category account for tendencies of the constituent word to occur in either modifier or head position. Furthermore, the paper explores the degree to which the semantic category combination of head and modifier word, e.g., x=substance and y=artifact, indicates the semantic relation between the constituents, e.g., y_consists_of_x.
In psychotherapy, therapists often formulate interpretations of clients' prior talk which are ‘unilateral’ in the sense that therapists index that they are themselves the author of an interpretive inference which may not be acceptable to the client. Based on 100 German-language recordings of brief psychodynamic psychotherapy (4 clients with 25 sessions each), we describe a multimodal practice of constructing extended multi-unit turns of delivering therapeutic interpretations. The practice includes gaze aversion until the main point of the interpretation is reached, perceptive and cognitive formulae, epistemic hedges, inserted accounts, parenthesis, self-repair, and self-reformulations. These design-features work together to index that the therapist produces an interpretation that can be heard as being tentative. The design of the therapists' turns reflexively indexes the expectation that the client might resist the interpretation; at the same time they are constructed to avoid resistance and to invite the client's self-exploration into new directions, often with a focus on emotions.
The teaching slides accompany the following textbook:
Svenja Völkel & Franziska Kretzschmar (2021): Introducing linguistic research. Cambridge: Cambridge University Press.
The slides follow the structure of the book chapters and can be used for teaching in class. They include the basic information per chapter and exercises to work on in class or as homework. More detailed information, additional exercises, suggestions for research projects and recommendations for further reading can be found in the textbook.
Ist die Germanistik – oder besser: befinden sich die Deutschstudien insgesamt in den nordischen und baltischen Ländern ‚auf dem absteigenden Ast‘? Was die an vielen Orten der Region seit längerem rückläufige Zahl der Studierenden und die Anzahl der Deutschinstitute und -abteilungen an den Hochschulenbetrifft, kann dem in weiten Teilen kaum widersprochen werden. Aber gilt dies auch für die Qualität der Ausbildung und das sprachliche Niveau der Studienanfänger/innen? Und sägen die Deutschstudien in der Region durch zu wenig ansprechende Studienangebote nicht vielleicht selbst an dem Ast, auf dem sie sitzen? Mit diesen Fragen beschäftigt sich das Projekt UniStart Deutsch@NBL, das in diesem Beitrag vorgestellt wird.
Dieses Kapitel untersucht die syntaktischen Funktionen von vollen (nicht-pronominalen) Nominalphrasen (NPs) und die Funktionen der vier Kasus des Deutschen aus quantitativer Perspektive. Es wird vorgeschlagen, das Konzept der syntaktischen Funktion in grundlegendere Merkmale zu zerlegen. Dazu gehören der Typ desjenigen Elements, dem die NP untergeordnet ist, und die Art der Beziehung zwischen der NP und dem übergeordneten Element (ganz allgemein: Komplementation vs. Modifikation).
Streit um Sprache
(2021)
Dieses Kapitel untersucht die Stellung adnominaler Genitive im Deutschen. Die Stellungsvariation besteht fast ausschließlich für artikellose Eigennamen, weshalb diese im Zentrum der Analyse stehen. Auf Basis von Korpusdaten kann gezeigt werden, dass die Faktoren Belebtheit und Länge des Attributs sowie Kasus der Gesamtphrase einen großen Teil der Variation erklären.