Refine
Year of publication
- 2023 (100) (remove)
Document Type
- Article (37)
- Part of a Book (35)
- Book (8)
- Other (7)
- Conference Proceeding (6)
- Working Paper (3)
- Preprint (2)
- Image (1)
- Part of Periodical (1)
Language
- English (100) (remove)
Keywords
- Deutsch (30)
- Interaktion (25)
- Korpus <Linguistik> (19)
- Konversationsanalyse (14)
- German (9)
- Multimodalität (9)
- Syntax (9)
- Englisch (8)
- Infrastruktur (8)
- Kommunikation (8)
Publicationstate
- Veröffentlichungsversion (75)
- Zweitveröffentlichung (21)
- Postprint (13)
- Ahead of Print (2)
Reviewstate
- Peer-Review (71)
- (Verlags)-Lektorat (26)
Publisher
- IDS-Verlag (16)
- Springer Nature (6)
- Springer (5)
- Zenodo (5)
- Narr Francke Attempto (4)
- Benjamins (3)
- Buske (3)
- Equinox (3)
- Leibniz-Institut für Deutsche Sprache (IDS) (3)
- Taylor & Francis (3)
This conference booklet provides information about 10th International Contrastive Linguistics Conference (ICLC-10) that took place in Mannheim, Germany, from 18 to 21 July 2023. It contains
– a description of the conference aims,
– details on the conference venue,
– information on committees,
– the conference program,
– the abstracts of the keynotes, oral and poster presentations, and
– an author index.
This manual introduces a conversation analytically informed coding scheme for episodes involving the direct social sanctioning of problem behavior in informal social interaction which was developed in the project Norms, Rules, and Morality across Languages (NoRM-aL) at the Leibniz-Institute for the German Language. It outlines the background for its development, delimits the phenomena to which the coding scheme can be applied and provides instructions for its use.
The scheme asks for basic information about the recording and the participants involved in the episode, before taking stock of different features of the sanctioning episode as a whole. This is followed by sets of specific coding questions about the sanctioning move itself (such as its timing and composition) and the reaction it engenders. The coding enables researchers to get a bird’s eye view on recurrent features of such episodes in larger quantities of data and allows for comparisons across different languages and informal settings.
One of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of what we call average prediction complexity. We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. In addition, we show that speaker population size predicts entropy. We argue that both results constitute evidence against the equi-complexity hypothesis from an information-theoretic perspective.
Besides English, Afrikaans is considered “the [Germanic] language which deviates grammatically the farthest from the others” (Harbert 2007: 17). But how exactly do we measure “grammatical deviation”, and how deviant is Afrikaans really if we compare it not just to other standard languages but also to non-standard varieties? The present contribution aims to address those questions combining functional-typological and dialectometric perspectives. We first select data for 28 Germanic varieties showing vastly different speaker numbers, grades of standardisation and amounts of language contact. Based on 48 (micro)typological variables from syntax, morphology and phonology, we perform cluster analysis and multidimensional scaling and present ways of visualizing and interpreting the results. Inter alia, the analyses show a major divide between Continental West Germanic and North Germanic (as might be expected) and they also identify a number of outliers, including English and pidgin and creole languages such as Russenorsk or Rabaul Creole German. Afrikaans appears to cluster with the other West Germanic languages rather than the outliers. Within West Germanic, however, it does indeed emerge as rather deviant and, according to our metric, it is, for example, typologically closer to other high-contact varieties such as Yiddish than it is to Dutch.
Allusion
(2023)
It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
Assessment
(2023)
Most broadly, an assessment is a type of social action by which an interactant expresses an evaluative stance towards someone or something (e.g., an object, an event, an action, an experience, a state of affairs, a place, a circumstance, etc.). The target of an assessment is typically called the ‘assessable’.
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
The ubiquity of smartphones has been recognised within conversation analysis as having an impact on conversational structures and on the participants’ interactional involvement. However, most of the previous studies have relied exclusively on video recordings of overall encounters and have not systematically considered what is taking place on the device. Due to the personal nature of smartphones and their small displays, onscreen activities are of limited visibility and are thus potentially opaque for both the co-present participants (“participant opacity”) and the researchers (“analytical opacity”). While opacity can be an inherent feature of smartphones in general, analytical opacity might not be desirable for research purposes. This chapter discusses how a recording set-up consisting of static cameras, wearable cameras and dynamic screen captures allowed us to address the analytical opacity of mobile devices. Excerpts from multi-source video data of everyday encounters will illustrate how the combination of multiple perspectives can increase the visibility of interactional phenomena, reveal new analytical objects and improve analytical granularity. More specifically, these examples will emphasise the analytical advantages and challenges of a combined recording set-up with regard to smartphone use as multiactivity, the role of the affordances of the mobile device, and the prototypicality and “naturalness” of the recorded practices.
Contrastive analysis of climate-related neologisms registered in GermanN and French Wikipedia
(2023)
Neologisms represent new social norms, tendencies, controversies and attitudes. They denote new or changed concepts which are constantly being negotiated between different members of the discourse community (Wodak 2022 and Catalano/Waugh (eds.) 2020). Neologisms help to identify new communicative patterns and narratives which illustrate different strings of discourse in everyday life. In recent years, many neologisms relating to the subject of the environment and climate have been emerging around the world mainly due to dominant discussions on climate change and the movement “Fridays for Future”. In German, for example, neologisms such as Klimakleber, klimaresilient and globaler Streik and in French neologisms such as éco-anxiété, justice climatique and écocitoyen could be observed. These neologisms occur in many domains of life, for example in politics, media and also in advertising, which means that “l’importance croissante des enjeux environnementaux dans les discours politiques, médiatiques et publicitaires” (Balnat/Gérard 2022, p. 22) can be identified. However, it is not only the occurrence of environment- or climate-related topics that is increasing, but also the rising polarisation of the public debate. The polarisation within public discourse is based on the fact that there are opposing positions which are represented by new or recently relevant terms such as activistes du climat (or Klimaaktivisten) and climatosceptiques (or Klimaskeptiker) (Balnat/Gérard 2022, p. 22). Due to different identifications with one or the other side, one can also speak of an “affrontement idéologique” (Balnat/Gérard 2022, p. 23). 1 The explosive nature and the high complexity of the debate on climate and the environmental issues mean that many words are naturally unfamiliar to people. This is especially true with regard to neologisms. In addition, it is often not only the new word itself but also the signified concept that is initially unknown. When people then look up words, they often do so on the Internet. Wikipedia as a “free encyclopedia” (Wikipedia 2023) is particularly well suited as an object of study with regard to neologisms, since factual knowledge is given special attention there. Furthermore, this reference guide is perceived as a regular source of agreed and common knowledge on all sorts of subjects. Hence, the descriptions found here represent social agreement on controversial terms and discussions to some degree. In this paper, German and French neologisms from the subject area of climate and environment will be examined primarily in Wikipedia, but also in the neighbouring resource Wiktionary,2 which is “a collaborative project to produce a free-content multilingual dictionary” (Wiktionary 2023). Since Wikipedia and Wiktionary are available in French and in German, 21010. International Contrastive Linguistics Conference (ICLC) both are equally suitable for the contrastive analysis. Thus, Wikipedia articles which are accessible in both languages (e.g. Klimanotstand and État d›urgence climatique) or Wikipedia articles about similar events and phenomena (e.g. Letzte Generation and Dernière Rénovation) will be compared. For example, we will have a closer look at other new terms specifying different thematic aspects of the discourse of climate and environment. We will mainly refer to those lexical items which can be found in the respective articles in both languages. Special emphasis will be on overlaps and differences, thematic foci, speaker’s positions and evaluative terms.
Telephone-based remote interpreting has come into widespread use in multilingual encounters, all the more so in times of refugee crises and the large influx of asylum-seekers into Europe. Nevertheless, the linguistic practices in this mode of communication have not yet been examined comprehensively. This article therefore investigates selected aspects of turn-taking and clarification sequences during semi-authentic telephone-interpreted counselling sessions for refugees (Arabic–German). A quantitative analysis reveals that limited audibility makes it more difficult for interpreters to claim their turn successfully; in most cases, however, turn-taking occurs smoothly. The trouble sources that trigger queries are mainly content-related and interpreters vary greatly in the ways they deal with such difficulties. Contrary to what one might expect, the study shows that coordination fails only rarely during telephone-based remote interpreting.
"Reproducibility crisis" and "empirical turn" are only two keywords when it comes to providing reasons for research data management. Research data is omnipresent and with the more and more automatic data processing procedures, they become even more important. However, just because new methods require data and produce data, this does not mean that data are easily accessible, reusable or even make a difference in the CV of a researcher, even if a large portion of research goes into data creation, acquisition, preparation, and analysis. In this talk I will present where we find data in the research process, where we may find appropriate support for data management and advocate for a procedure for including it in research publications and resumes.
This presentation relies on work within the BMBF-funded project CLARIN-D. It also builds on work within the German National Research Data Infrastructure (NFDI) consortium Text+, DFG project number 460033370.
A constructicon, i.e., a structured inventory of constructions, essentially aims at documenting functions of lexical and grammatical constructions. Among other parameters, so-called constructional collo-profiles, as introduced by Herbst (2018, 2020), are conclusive for determining constructional meanings. They provide information on how relevant individual words are for construction slots, they hint at usage preferences of constructions and serve as a helpful indicator for semantic peculiarities of constructions. However, even though collo-profiles constitute an indispensable component of constructicon entries, they pose major challengers for constructicographers: For a constructicographic enterprise it is not feasible to conduct collostructional analyses for hundreds or even thousands of constructions. In this article, we introduce a procedure based on the large language model BERT that allows to predict collo-profiles without having to extensively annotate instances of constructions in a given corpus. Specifically, by discussing the constructions X macht Y ADJP (‘x makes Y ADJ’, e.g. he drives him crazy) and N1 PREP N1 (e.g., bumper to bumper, constructions over constructions), we show how the developed automated system generates collo-profiles based on a limited number of annotated instances. Finally, we place collo-profiles alongside other dimensions of constructional meanings included in the German Constructicon.
Using multimodal conversation analysis, we investigate how novices learning the “inner body” acting technique in the context of a community theater project share their experiences of the bodily exercises through verbal and embodied conduct. We focus on how verbal description and bodily enactment of the experience mutually elaborate each other, and how the experienced sensorimotor and affective qualities are made to be witnessed and recognized by the others. Participants describe their experiences without naming qualities. Instead, a display of the experienced qualities is made accessible to others through coordinating the unfolding talk and bodily conduct. In particular, we show how grammatical and action projection is fulfilled by interconnected verbal and embodied conduct, with body movement and posture giving off ineffable experiential qualities. The moving body appears both as a source of the experience and as a resource for depicting perceived qualities to others; additional resources (non-specific person reference and gaze aversion) contribute to organizing the subjective and intersubjective layers of the reflection of the experiences. The study contributes to and extends recent research on sensoriality in interaction by focusing on phenomena of proprioception and interoception. The data are two cases drawn from 60 h of video-recordings made in the context of a devised community theater project. The data are in Finnish with English translations.
The idea of this article is to take the immaterial and somehow ethereal nature of aesthetic concepts seriously by asking how aesthetic concepts are negotiated and thus formed in communication. My examples come from theatrical production where aesthetic decisions naturally play a major role. In the given case, an aesthetic concept is introduced with which only the director, but none of the actors is familiar in the beginning of the rehearsals. The concept, Wabi Sabi, comes from Japanese culture. As the whole rehearsal process was video recorded, it is possible to track the process of how the concept is negotiated and acquired over time. So, instead of defining criteria what Wabi Sabi as an aesthetic concept “consists of,” this article seeks to show how the concept is introduced, explained and “used” within a practical context, in this case a theater rehearsal. In contrast to conventional models of aesthetic experience, I am interested in the ways in which an aesthetic concept is configured in and through socially organized interaction, and — vice versa — how that interaction contributes to the situational accomplishment of the same concept. In short: I am interested in the “doing” of aesthetic concepts, especially in “doing Wabi Sabi.”
The Encyclopedia of Terminology for Conversation Analysis and Interactional Linguistics is an online resource for students and scholars of CA/IL, publicly available on the EMCA Wiki page. Encyclopedias and glossaries are widespread across various fields and methods, and serve as immensely valuable resources. Given the extent to which the EMCA/IL community has expanded over the years—both terminologically as well as geographically—we hope that this encyclopedia of terminology will be well received by students and practitioners of CA and IL across the globe.
Rules of behavior are fundamental to human sociality. Whether on the road, at the dinner table, or during a game, people monitor one another’s behavior for conformity to rules and may take action to rectify violations. In this study, we examine two ways in which rules are enforced during games: instructions and reminders. Building on prior research, we identify instructions as actions produced to rectify violations based on another’s lack of knowledge of the relevant rule; knowledge that the instruction is designed to impart. In contrast to this, the actions we refer to as reminders are designed to enforce rules presupposing the transgressor’s competence and treating the violation as the result of forgetfulness or oversight. We show that instructing and reminding actions differ in turn design, sequential development, the epistemic stances taken by transgressors and enforcers, and in how the action affects the progressivity of the interaction. Data are in German and Italian from the Parallel European Corpus of Informal Interaction (PECII).
The Data Governance Act was proposed in late 2020 as part of the European Strategy for Data, and adopted on 30 May 2022 (as Regulation 2022/868). It will enter into application on 24 September 2023. The Data governance Act is a major development in the legal framework affecting CLARIN and the whole language community. With its new rules on the re-use of data held by the public sector bodies and on the provision of data sharing services, and especially its encouragement of data altruism, the Data Governance Act creates new opportunities and new challenges for CLARIN ERIC. This paper analyses the provisions of the Data Governance Act, and aims at initiating the debate on how they will impact CLARIN and the whole language community.
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
We present a collection of (currently) about 5.500 commands directed to voice-controlled virtual assistants (VAs) by sixteen initial users of a VA system in their homes. The collection comprises recordings captured by the VA itself and with a conditional voice recorder (CVR) selectively capturing recordings including the VA-directed commands plus some surrounding context. Next to a description of the collection, we present initial findings on the patterns of use of the VA systems during the first weeks after installation, including usage timing, the development of usage frequency, distributions of sentence structures across commands, and (the development of) command success rates. We discuss the advantages and disadvantages of the applied collection-specific recording approach and describe potential research questions that can be investigated in the future, based on the collection, as well as the merit of combining quantitative corpus linguistic approaches with qualitative in-depth analyses of single cases.
Interactants who encounter co-participant conduct which they find to be socio-normatively problematic or troublesome are faced with a range of choices. First and foremost, this includes the issue of whether to directly address it, or to simply ‘let it pass’ (at least for now) (Emerson/Messinger 1977). In the case of the former, the issue then becomes how to address it. Across the various ways in which participants can pragmatically engage with what they perceive to be transgressive or untoward behavior (e.g., Pomerantz 1978; Schegloff 1988b; Dersley/Wootton 2000; Günthner 2000; Bolden/Robinson 2011; Potter/Hepburn 2020; see also Rodriguez 2022), they sometimes meta-pragmatically formulate the co-participant’s doings in terms of specific actions. Such action descriptions are necessarily selective (Sacks 1963; Schegloff 1972, 1988a; Sidnell/Barnes 2013): They foreground certain aspects of the co-participant’s conduct, while backgrounding others, and thus contribute to publically construeing the formulated conduct in particular ways (Jayyusi 1993), viz. as socio-normatively problematic, transgressive or untoward, and interactionally accountable (Robinson 2016; Sidnell 2017).
Recent years have seen a growing interest in grammatical variation, a core explanandum of grammatical theory. The present volume explores questions that are fundamental to this line of research: First, the question of whether variation can always and completely be explained by intra- or extra-linguistic predictors, or whether there is a certain amount of unpredictable – or ‘free’ – grammatical variation. Second, the question of what implications the (in-)existence of free variation would hold for our theoretical models and the empirical study of grammar. The volume provides the first dedicated book-length treatment of this long-standing topic. Following an introductory chapter by the editors, it contains ten case studies on potentially free variation in morphology and syntax drawn from Germanic, Romance, Uralic and Mayan.
In many countries of the world, perspectives on gender equality and racism have changed in recent decades. One result has been more attention being devoted to traces of androcentric and racist language in society. This also affects dictionaries. In lexicography there are discussions about whether or to what extent social asymmetries are inscribed in dictionaries and if this is still acceptable. The issue of the nature of description plays an important role in this discussion. If sexist usages are often found in language use, i.e. in the corpus data on which the dictionary is based, does the dictionary also have to show them? How is this, in turn, compatible with the normative power of dictionaries? Do dictionaries contribute to the perpetuation of gender stereotypes by showcasing them under the banner of descriptive principles? And what roles do lexicographers play in this process? The article deals with these questions on the basis of individual lexicographical examples and current discussions in the lexicographic and public community.
In this chapter, we will investigate smartphone-based showing sequences in everyday social encounters, that is, moments in which a personal mobile device is used for presenting (audio-)visual content to co-present participants. Despite a growing interest in object-centred sequences and mundane technology use, detailed accounts of the sequential, multimodal, and material dimensions of showing sequences are lacking. Based on video data of social interactions in different languages and on the framework of multimodal interaction analysis, this chapter will explore the link between mobile device use and social practices. We will analyse how smartphone showers and their recipients coordinate the manipulation of a technological object with multiple courses of action, and reflect upon the fundamental complexity of this by-now routine joint activity.
Our everyday lives in any social community are shaped by rules (e.g., Roughley 2019; Schmidt/Rakoczy 2019). Rules (in a broad sense) are interactionally negotiated, monitored, enforced, and serve as an ‘orientation value‘ in social life. If someone‘s behavior is treated as norm-violating or problematic in certain way, it may be therefore confronted. Confronting interlocutors can immediately stop, modify, or retrospectively reprimand the misconduct of others in a moralizing manner. Such confrontations of a problem behavior occur commonly in informal interactions. On the basis of our corpus, specifically in informal interactions at the table, I observed that, for example, in Polish, German and British English, direct confrontations occur on average at least once every three minutes. Participants design these actions in a variety of ways, but like everything in interaction, the design is not arbitrary (Sacks 1984; Enfield/Sidnell 2019). A recurrent feature of such turns is connecting misconduct to some more general concepts. It is evident from the data that e.g. speakers of German and Polish use ‘generally valid statements’ in problematic moments (cf. Küttner/Vatanen/Zinken 2022) to reach the closure of the problem sequence, also specifically dealing there with distribution of deontic and epistemic rights (Rogowska in prep.). I ask, when and for what purpose generality, that is, abstracting from a concrete behaviour, is used as a tool while confronting others. The focus is on sequential and linguistic features of abstracting in confronting moments in language comparison. What are the methods to achieve abstraction: i) defocusing the confronted, specific agent (cf. Zinken et al. 2021; Siewierska 2008), e.g. nur derjenige der dran ist der darf die bedingungen für den handel stellen (only the one whose turn it is may set the conditions for the trade); using ii) extreme case formulations (Pomerantz 1986), e.g. na siostrę zawsze można liczyć (you can always count on a sister); iii) referring to stable character traits, e.g. Matylda bardzo chetne by podala. (.) Ona jest taka skora do pomocy (Matylda would be very happy to pass (it to you). (.) She is so eager to help); or iv) broader categorizing of the given referent, e.g. do not build (.) do do not build do not build swastikas (when a) German guy is filming us? Sometimes, even several locus of abstraction are combined in the same turn. Can we identify language-specific and cross-linguistic patterns? What are the interactional consequences: enforcing a compliant behavior in the future, eliciting an apology or cognitively simplifying complex problems? From a comparative perspective, I ask whether going beyond the here-and-now while confronting others is a practice that unites speakers across languages and is thus a human cognitive strategy to display normativity. This ongoing study is based on new comparable data from four European languages from informal interaction during activities around the table (Kornfeld/Küttner/Zinken 2023; Küttner et al. in prep.). The phenomenon was coded systematically in each of the four languages as part of a larger, quantitatively oriented study with different questions (Küttner et al. submitted). In the talk, I will show exemplarily Polish and German evidence. I use the methods of Conversation Analysis (Sidnell/Stivers (eds.) 2012) and Interactional Linguistics (Imo/Lanwer 2019).
A central goal of linguistics is to understand the diverse ways in which human language can be organized (Gibson et al. 2019; Lupyan/Dale 2016). In our contribution, we present results of a large scale cross-linguistic analysis of the statistical structure of written language (Koplenig/Wolfer/Meyer 2023) we approach this question from an information-theoretic perspective. To this end, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections, so-called corpora, consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of un. To this end, we have trained a language model on more than 6,500 different documents as represented in 41 parallel/multilingual corpora consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Figure 1 shows that our database covers a large variety of different text types, e.g. religious texts, legalese texts, subtitles for various movies and talks, newspaper texts, web crawls, Wikipedia articles, or translated example sentences from a free collaborative online database. Furthermore, we use word frequency information from the Crúbadán project that aims at creating text corpora for a large number of (especially under-resourced) languages (Scannell 2007). We statistically infer the entropy rate of each language model as an information-theoretic index of (un)predictability/complexity (Schürmann/Grassberger 1996; Takahira/Tanaka-Ishii/Dębowski 2016). Equipped with this database and information-theoretic estimation framework, we first evaluate the so-called ‘equi-complexity hypothesis’, the idea that all languages are equally complex (Sampson 2009). We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. This constitutes evidence against the equi-complexity hypothesis from an information-theoretic perspective. We then present, discuss and evaluate evidence for a complexity-efficiency trade-off that unexpectedly emerged when we analysed our database: high-entropy languages tend to need fewer symbols to encode messages and vice versa. Given that, from an information theoretic point of view, the message length quantifies efficiency – the shorter the encoded message the higher the efficiency (Gibson et al. 2019) – this indicates that human languages trade off efficiency against complexity. More explicitly, a higher average amount of choice/uncertainty per produced/received symbol is compensated by a shorter average message length. Finally, we present results that could point toward the idea that the absolute amount of information in parallel texts is invariant across different languages.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Introducing Interactive Grammar: How to Develop Language Competence with Research-based Learning
(2023)
We present the implementation of an interactive e-learning platform for both classroom study and self-study, that helps developing German language competence – vocabulary, spelling, and grammar – on various levels and for everyday life applications. The LernGrammis portal addresses school and highschool students, (prospective) teachers, and L2 learners of German equally, each with appropriate educational content and interactive components. It thus offers the digital networking infrastructure for education a unique, freely available and scientifically based learning resource. Applying the innovative concept of „Research-based Learning (RBL)“, LernGrammis provides teachers with ideas for lesson planning, and learners with dedicated modules to develop new skills through exploring authentic language resources and by this means answering customised low-threshold research questions. Using proven practical examples, we demonstrate the approach, its strengths and possibilities, as well as initial user feedback evaluation results.
In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.
Introduction
(2023)
This replication study aims to investigate a potential bias toward addition in the German language, building upon previous findings of Winter and colleagues who identified a similar bias in English. Our results confirm a bias in word frequencies and binomial expressions, aligning with these previous findings. However, the analysis of distributional semantics based on word vectors did not yield consistent results for German. Furthermore, our study emphasizes the crucial role of selecting appropriate translational equivalents, highlighting the significance of considering language-specific factors when testing for such biases for languages other than English.
This presentation deals with collaborative turn-sequences (Lerner 2004), a syntactically coherent unit of talk that is jointly formulated by at least two speakers, in Czech and German everyday conversations. Based on conversation analysis (e.g., Schegloff 2007) and a multimodal approach to social interaction (e.g., Deppermann/Streeck 2018), we aim at comparing recurrent patterns and action types within co-constructional sequences in both languages. The practice of co-constructing turns-at-talk has been described for typologically different languages, especially for English (e.g., Lerner 1996, 2004), but also for languages such as Japanese (Hayashi 2003) or Finnish (Helasvuo 2004). For German, various forms and functions of co-constructions have already been investigated (e.g., Brenning 2015); for Czech, a detailed, interactionally based description is still pending (but see some initial observations in, e.g., Hoffmannová/Homoláč/Mrázková (eds.) 2019). Although the existence of co-constructions in different languages points to a cross-linguistic conversational practice, few explicitly comparative studies exist (see, e.g., Lerner/Takagi 1999, for English and Japanese). The language pair Czech-German has mainly been studied with respect to language contact and without specifically considering spoken language or complex conversational sequences (e.g., Nekula/Šichová/Valdrová 2013). Therefore, our second aim is to sketch out a first comparison of co-constructional sequences in German and Czech, thereby contributing to the growing field of comparative and cross-linguistic studies within conversation analysis (e.g., Betz et al. (eds.) 2021; Dingemanse/Enfield 2015; Sidnell (ed.) 2009). More specifically, we will present three main sequential patterns of co-constructional sequences, focusing on the type of action a second speaker carries out by completing a first speaker’s possibly incomplete turn-at-talk, and on how the initial speaker then responds to
this suggested completion (Lerner 2004). Excerpts from video recordings of Czech and German ordinary conversations will illustrate these recurrent co-constructional sequence types, i.e., offering help during word searches (see example 1 above), displaying understanding, or claiming independent knowledge. The third objective of this paper is to underline the participants’ orientation to similar interactional problems, solved by specific syntactic and/or lexical formats in Czech and German. Considering the more recent focus on the embodied dimension of co-constructional practices (e.g., Dressel 2020), we will also investigate the multimodal formatting of a started utterance as more or less “permeable” (Lerner 1996) for co-participant completion, the participants’ mutual embodied orientation, and possible embodied responses to others’ turn-completions (such as head nods or eyebrow flashes, cf. De Stefani 2021). More generally, this contribution reflects on the possibilities and challenges of a cross-linguistic comparison of complex multimodal sequences.
Theater rehearsals are (usually) confronted with the problem of having to transform a written text into an audio-visual, situated and temporal performance. Our contribution focuses on the emergence and stabilization of a gestural form as a solution for embodying a certain aesthetic concept which is derived from the script. This process involves instructions and negotiations, making the process of stabilization publicly and thus intersubjectively accessible. As scenes are repeatedly rehearsed, rehearsals are perspicuous settings for tracking interactional histories. Based on videotaped professional theatre interactions in Germany, we focus on consecutive instances of rehearsing the same scene and trace the interactional history of a particular gesture. This gesture is used by the director to instruct the actors to play a particular aspect of a scene adopting a certain aesthetic concept. Stabilization requires the emergence of shared knowledge. We will show the practices by which shared knowledge is established over time during the rehearsal process and, in turn, how the accumulation of knowledge contributes to a change in the interactional practices themselves. Specifically, we show how a gesture emerges in the process of developing and embodying an aesthetic concept, and how this gesture eventually becomes a sign that refers to and evokes accumulated knowledge. At the same time, we show how this accumulated knowledge changes the instructional activities in the rehearsal process. Our study contributes to the overall understanding of knowledge accumulation in interaction in general and in theater rehearsals in particular. At the same time, it is devoted to the central importance of gestures in theater, which are both a means and a product of theatrical staging.
Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.
The present article proposes a syntactic and semantic analysis of assertive clauses that comprises their truth-conditional aspects and their speech act potential in communication. What is commonly called “illocutionary force” is differentiated into three structurally and functionally distinct layers: a judgement phrase, representing subjective epistemic and evidential attitudes; a commitment phrase, representing the social commitment related to assertions; and an act phrase, representing the relation to the common ground of the conversation. The article provides several pieces of evidence for this structure: from the interpretation and syntactic position of various classes of epistemic, evidential, affirmative and speech act-related operators, from clausal complements embedded by different types of predicates, from embedded root clauses, and from anaphora referring to different clausal projections. The syntactic assumptions are phrased within X-bar theory, and the semantic interpretation makes use of dynamic update of common ground, differentiating between informative and performative updates. The object language is German, with particular reference to verb final and verb second structure.
Despite being an official language of several countries in Central and Western Europe, German is not formally recognised as the official language of the Federal Republic of Germany. However, in certain situations the use of the German language, including the spelling rules, is subject to state regulation (by acts of Federal Parliament orby administrative decisions). This article presents the content of this regulation, its scope, and the historical context in which it was adopted.
This paper discusses contemporary societal roles of German in the Baltic states (Latvia, Estonia, Lithuania). Speaker and learner statistics and a summary of sociolinguistic research (Linguistic Landscapes, language learning motivation, language policies, international roles of languages) suggest that German has by far fewer speakers and functions than the national languages, English, and Russian, and it is not a dominant language in the contemporary Baltics anymore. However, German is ahead of ‘any other language’ in terms of users and societal roles as a frequent language in education, of economic relations, as a historical lingua franca, and a language of traditional and new minorities. Highly diverse groups of users and language policy actors form a ‘coalition of interested parties’ which creates niches which guarantee German a frequent use. In the light of the abundance of its functions, the paper suggests the concept ‘additional language of society’ for a variety such as German in the Baltics – since there seems to be no adequate alternative labelling which would do justice to all societal roles. The paper argues that this concept may also be used for languages in similar societal situations and, not least, be useful in language marketing and the promotion of multilingualism.
This contribution summarizes the lessons learned from the organization of a joint conference on text analytics research by the Business, Economic, and Related Data (BERD@NFDI) and Text+ consortia within the National Research Data Infrastructure (NFDI) in Germany. The collaboration aimed to identify common ground and foster interdisciplinary dialogue between scholars in the humanities and in the business domain. The lessons learned include the importance of presenting research questions using textual data to establish common ground, similarities in methodology for processing textual data between the consortia, similarities in research data management, and the need for regular interconsortial discussions on textual analysis methods and data. The collaboration proved valuable for interdisciplinary dialogue within the NFDI, and further collaboration between the consortia is planned.
Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
This conversation analytic study compares the use of negation particles in spoken German and Persian, namely nein/nee and na. While these particles have a range of functions in both languages (Ghaderi 2022; Imo 2017), their use in response to news remains understudied. We focus on nein/nee and na in two sequential contexts: (i) after prior disconfirmations (Extract (a)) and (ii) in response to either solicited or unsolicited informings (see Extracts (b) and (c), respectively). In both contexts, nein/nee and na mark unexpectedness and open up an opportunity space for more, but they do so in different ways and with different outcomes. Nein/nee- and na-turns after disconfirming, often minimal responses to first-position confirmable turns mark the prior as unexpected (or even contrasting with the nein/nee/na-speaker’s expectations) and thus as expandable/accountable (cf. Ford 2001; Gubina/Betz 2021). Nein/nee/na-turns after informings (e.g., announcements that display a story teller’s negative emotional stance) differ not only in sequential position but also in prosodic realization. They can be either falling or rising, but all are characterized by marked prosody, i.e., lengthening, very low onset, smiling or breathy voice, or high overall pitch. Through position and turn design features, such nein/nee- and na-turns not only mark a prior turn as counter to (normative) expectations, but may also display the speaker’s affective stance and affiliate with the affective stance of the prior interactant. By comparing the use of nein/nee and na in German and Persian in the two functions illustrated in Extracts (a) and (b/c), we will show (i) how nein/nee- and na-turns shape interactional trajectories after responsive actions and (ii) what role the particles play in managing news and stance-taking as well as epistemic and affective positioning. Apart from revealing similarities in the use of German and Persian negation particles, the results of our crosslinguistic comparison will demonstrate that even if different languages have similar practices for specific actions, the use of these practices is language- and culture-specific. This means that even similar practices in different languages have their own “collateral effects” (Sidnell/Enfield 2012), linguistic and prosodic characteristic features, and, at least sometimes, consequences for social actions accomplished in the specific language (e.g., Dingemanse/Blythe/Dirksmeyer 2014; Evans/Levinson 2009; Floyd/Rossi/Enfield (eds.) 2020; Fox et al. 2009). Our study uses the method of Conversation Analysis (Sidnell/Stivers (eds.) 2013) and draws on more than 80 hours of audio and video recordings of spontaneous interactions (co-present, via video link, and on the telephone) in everyday and institutional contexts.
This paper analyses intensification in German digitally-mediated communication (DMC) using a corpus of YouTube comments written by young people (the NottDeuYTSch corpus). Research on intensification in written language has traditionally focused on two grammatical aspects: syntactic intensification, i.e. the use of particles and other lexical items and morphological intensification, i.e. the use of compounding. Using a wide variety og examples from the corpus, the paper identifies novel ways that have been used for intensification in DMC, and suggests a new taxonomy of classification for future analysis of intensification.
Modular pivot
(2023)
A modular pivot is a type of turn-constructional pivot. It is built from syntactically entirely optional items (i.e. linguistic adjuncts) that can occur in both turn-initial and turn-final position and can therefore be used to patch a wide range of otherwise discrete turn-constructional units (TCUs) together (Clayman & Raymond 2015). A prime example of an item that lends itself to be deployed as a modular pivot are address terms (Clayman 2012).
Morphophonological asymmetries in affixation concern systematic correlations between morphological properties of affixes (e.g. combination with bound versus free stems, position relative to stem (suffixes versus prefixes)) and their phonological properties (e.g. stress behaviour). The arguably most insightful approach to capturing relevant asymmetries invokes a notion of affix coherence, first introduced by Dixon in connection with his work on Yidiɲ, a nearly extinct language spoken in Northern Australia. This notion is based on a categorical division of affixes into ones that integrate into the phonological word of the stem and ones that do not. The integration of affixes is envisioned as being fully determined by phonological and morphological structure in a given language and verifiable by diagnostics relevant to phonological word domains (primarily the syllable and the foot structure). The assumption of two types of prosodic domains characterized by integrated versus non-integrated affixes is manifest in consistent asymmetries that pertain to morphophonological, phonological, and phonetic rules. This consistency constitutes compelling evidence for the structure-based analysis of the impact of various affixes on derived words, as opposed to alternative approaches to capturing these effects by associating affixes with diacritics (morpheme versus word boundary, class 1 versus class 2, stratum 1 versus stratum 2). The present entry aims to demonstrate, mostly on the basis of data from Germanic languages, the breadth of the empirical evidence in support of a fundamental role of affix coherence. Moreover, it aims to draw attention to the various implications of affix coherence for modeling relevant generalizations, in particular the necessary reference to a level of phonological representation characterized by a specific degree of abstractness (‘phonemic’).
This article investigates mundane photo taking practices with personal mobile devices in the co-presence of others, as well as “divergent” self-initiated smartphone use, thereby exploring the impact of everyday technologies on social interaction. Utilizing multimodal conversation analysis, we examined sequences in which young adults take pictures of food and drinks in restaurants and cafés. Although everyday interactions are abundant in opportunities for accomplishing food photography as a side activity, our data show that taking pictures is also often prioritized over other activities. Through a detailed sequential analysis of video recordings and dynamic screen captures of mobile devices, we illustrate how photographers orient to the momentary opportunities for and relevance of photo taking, that is, how they systematically organize their photographing with respect to the ongoing social encounter and the (projected) changes in the material environment. We investigate how the participants multimodally negotiate the “mainness” and “sideness” (Mondada, 2014) of situated food photography and describe some particular features of participants’ conduct in moments of mundane multiactivity.
National Socialism, one could argue, was all about belonging: belonging to the ‘Volk’ or the ‘Volksgemeinschaft’, belonging to the ‘Aryan’ or ‘Non-Aryan race’, belonging to the National Socialist ‘movement’, and so on. These categories of belonging worked both inclusionary and exclusionary and they were constituted, proclaimed and enacted to a great part through language. What is more, they had to be performed through communicative acts. For the normative side of National Socialist propaganda and legislation, this seems rather obvious and one-directional. On the side of the general population, however, this entailed a mixture of communicative need to position oneself vis-à-vis National Socialism (mostly in affirmative ways), but also the urge to do so willingly. When we look at the language use of ‘ordinary people’ in different communicative situations and texts during National Socialism, we have to focus on these dimensions of discursive collusion, co-constitution and appropriation. People during National Socialism, such is our hypothesis, navigated through discourses of belonging and by that made them real and effective. Besides diaries, war letters and autobiographical writings, one way to grasp this phenomenon is to analyse petitions, i.e., letters of complaint and request sent in large numbers by ‘ordinary people’ to public authorities of the party and the state. As I will show by some examples, letter-writers tried to inscribe themselves within (what they took for) National Socialist discourses of belonging in order to legitimate their claims. By doing so, they co-constituted and co-created the discursive realm of National Socialism.
This article details the process of creating the Nottinghamer Korpus deutscher YouTube-Sprache ('The Nottingham German YouTube Language Corpus' - or NottDeuYTSch corpus) and outlines potential research opportunities. The corpus was compiled to analyse the online language produced by young German-speakers and offers significant opportunity for in-depth research across several linguistic fields including lexis, morphology, syntax, orthography, and conversational and discursive analysis. The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represent an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses. The NottDeuYTSch corpus is available for analysis as part of the German Reference Corpus (DeReKo).