Refine
Year of publication
Document Type
- Article (117)
- Part of a Book (117)
- Conference Proceeding (107)
- Book (25)
- Working Paper (10)
- Other (9)
- Preprint (7)
- Part of Periodical (5)
- Doctoral Thesis (2)
- Course Material (1)
Language
- English (403) (remove)
Is part of the Bibliography
- yes (403) (remove)
Keywords
- Korpus <Linguistik> (131)
- Deutsch (115)
- Interaktion (46)
- Konversationsanalyse (36)
- Computerlinguistik (33)
- Forschungsdaten (32)
- Gesprochene Sprache (31)
- Annotation (26)
- Englisch (22)
- Online-Wörterbuch (21)
Publicationstate
- Veröffentlichungsversion (403) (remove)
Reviewstate
Publisher
- IDS-Verlag (34)
- de Gruyter (22)
- European Language Resources Association (ELRA) (16)
- Linköping University Electronic Press (14)
- European language resources association (ELRA) (13)
- Lexical Computing CZ s.r.o. (12)
- Springer (12)
- Association for Computational Linguistics (11)
- Springer Nature (10)
- The Association for Computational Linguistics (10)
Gestures can be brief and compact in their execution, but also elaborate and extended. One way to utilise this kinetic flexibility is to extend one’s gesture in time by holding it in its stroke position. This study explores the interactional function of gestural holds by investigating pointing gestures that are sustained beyond a sequence-initiating turn and into the responsive space following it. The study draws on video data from naturally occurring conversations in German and focuses on held pointing gestures after instructions and questions. It is shown that in both action environments, participants delay gestural closure to indicate that they still consider the addressee’s response to be insufficient.
Drawing upon the transformative power of questions, the paper investigates questioning sequences from authentic coaching data to examine the systematic use of a particular succession of formulation and question and its impact on inviting self-reflection processes in the client and eliciting change. The object of investigation in this paper are therefore questioning sequences in which a coach asks a question immediately after a rephrasing or relocating action, prompting the client to respond in an explicit or implicit way. The coach hereby shifts the focus to a hypothetical scenario, prompting the client to change her perspective on the matter and reflect on her own statements, ideas and attitudes from an outside perspective. The paper aims to contribute to closing the research gap of the change potential of reflection-stimulating action techniques used by coaches, by investigating one of many ways of how questions can be powerful tools to invite a change of perspective for the client. The study focuses on one coaching process consisting of three sessions between a female coach and a female client, utilizing a single case study approach. The data collection was part of the interdisciplinary project “Questioning Sequences in Coaching”, comprising 14 authentic coaching processes. The analysis follows Peräkylä’s Transformative Sequences model, examining the first position including the formulation and the subsequent question, the client’s response, and the coach’s reaction to the response. On a practical level, the main purpose of this paper is not to contribute to the many ways practical literature recommends coaches how to do their work and how to ask questions, but rather to show in what ways the elicitation of self-reflection processes in clients has been achieved by other coaches in authentic coaching sessions.
Poetic diction routinely involves two complementary classes of features: (i) parallelisms, i.e. repetitive patterns (rhyme, metre, alliteration, etc.) that enhance the predictability of upcoming words, and (ii) poetic deviations that challenge standard expectations/predictions regarding regular word form and order. The present study investigated how these two prediction-modulating fundamentals of poetic diction affect the cognitive processing and aesthetic evaluation of poems, humoristic couplets and proverbs. We developed quantitative measures of these two groups of text features. Across the three text genres, higher deviation scores reduced both comprehensibility and aesthetic liking whereas higher parallelism scores enhanced these. The positive effects of parallelism are significantly stronger than the concurrent negative effects of the features of deviation. These results are in accord with the hypothesis that art reception involves an interplay of prediction errors and prediction error minimization, with the latter paving the way for processing fluency and aesthetic liking.
Rejecting the validity of inferred attributions of incompetence in German talk-in-interaction
(2024)
This paper deals with pragmatic inference from the perspective of Conversation Analysis. In particular, we examine a specific variety of inferences - the attribution of incompetence which Self constructs on the basis of Other's prior action, hearable as positioning Self as incompetent (e.g., instructions, offers of assistance, advice); this attribution of incompetence concerns Self's execution of some practical task. This inference is indexed in Self's response, which highlights Self's expertise, or competence concerning the task at hand. We focus on two recurrent types of such responses in our data: (i) accounting for competence through formulations of prior experience with carrying out a practical action and (ii) explicit claims of competence for accomplishing this action. We analyze the interactional environments in which these responses occur, the ways in which the two practices index Self's understanding of being positioned as incompetent and the interactional work they do. Finally, we discuss how through rejecting and inferred attribution of incompetence, Self implicitly seeks to restore their face and defend their autonomy as an agent, yet, without entering an explicit identity-negotiation. Findings rest on the analysis of 20 cases found in video-recordings of naturally occurring talk-in-interaction in German from the corpus FOLK.
This contribution explores the relationship between the English CEFR (Common European Framework of Reference for Languages) vocabulary levels and user interest in English Wiktionary entries. User interest was operationalized through the number of views of these entries in Wikimedia server logs covering a period of four years (2019–2022). Our findings reveal a significant relationship between CEFR levels and user interest: entries classified at lower CEFR levels tend to attract more views, which suggests a greater user interest in more basic vocabulary. A multiple regression model controlling for other known or potential factors affecting interest: corpus frequency, polysemy, word prevalence, and age of acquisition confirmed that lower CEFR levels attract significantly more views even after taking into account the other predictors. These findings highlight the importance of CEFR levels in predicting which words users are likely to look up, with implications for lexicography and the development of language learning materials.
This presentation deals with collaborative turn-sequences (Lerner 2004), a syntactically coherent unit of talk that is jointly formulated by at least two speakers, in Czech and German everyday conversations. Based on conversation analysis (e.g., Schegloff 2007) and a multimodal approach to social interaction (e.g., Deppermann/Streeck 2018), we aim at comparing recurrent patterns and action types within co-constructional sequences in both languages. The practice of co-constructing turns-at-talk has been described for typologically different languages, especially for English (e.g., Lerner 1996, 2004), but also for languages such as Japanese (Hayashi 2003) or Finnish (Helasvuo 2004). For German, various forms and functions of co-constructions have already been investigated (e.g., Brenning 2015); for Czech, a detailed, interactionally based description is still pending (but see some initial observations in, e.g., Hoffmannová/Homoláč/Mrázková (eds.) 2019). Although the existence of co-constructions in different languages points to a cross-linguistic conversational practice, few explicitly comparative studies exist (see, e.g., Lerner/Takagi 1999, for English and Japanese). The language pair Czech-German has mainly been studied with respect to language contact and without specifically considering spoken language or complex conversational sequences (e.g., Nekula/Šichová/Valdrová 2013). Therefore, our second aim is to sketch out a first comparison of co-constructional sequences in German and Czech, thereby contributing to the growing field of comparative and cross-linguistic studies within conversation analysis (e.g., Betz et al. (eds.) 2021; Dingemanse/Enfield 2015; Sidnell (ed.) 2009). More specifically, we will present three main sequential patterns of co-constructional sequences, focusing on the type of action a second speaker carries out by completing a first speaker’s possibly incomplete turn-at-talk, and on how the initial speaker then responds to
this suggested completion (Lerner 2004). Excerpts from video recordings of Czech and German ordinary conversations will illustrate these recurrent co-constructional sequence types, i.e., offering help during word searches (see example 1 above), displaying understanding, or claiming independent knowledge. The third objective of this paper is to underline the participants’ orientation to similar interactional problems, solved by specific syntactic and/or lexical formats in Czech and German. Considering the more recent focus on the embodied dimension of co-constructional practices (e.g., Dressel 2020), we will also investigate the multimodal formatting of a started utterance as more or less “permeable” (Lerner 1996) for co-participant completion, the participants’ mutual embodied orientation, and possible embodied responses to others’ turn-completions (such as head nods or eyebrow flashes, cf. De Stefani 2021). More generally, this contribution reflects on the possibilities and challenges of a cross-linguistic comparison of complex multimodal sequences.
In a previous study, Aceves and Evans present a large-scale quantitative information-theoretic analysis of parallel corpus data in ~1,000 languages to show that there are apparently strong associations between the way languages encode information into words and patterns of communication, e.g. the configuration of semantic information. During the peer review process, one reviewer raised the question of the extent to which the presented results depend on different corpus sizes (see the Peer Review File). This is a very important question given that most, if not all, of the quantities associated with word frequency distributions vary systematically with corpus size. While Aceves and Evans claim that corpus size does not affect the results presented, I challenge this view by presenting reanalyses of the data that clearly suggest that it does.
Besides English, Afrikaans is considered “the [Germanic] language which deviates grammatically the farthest from the others” (Harbert 2007: 17). But how exactly do we measure “grammatical deviation”, and how deviant is Afrikaans really if we compare it not just to other standard languages but also to non-standard varieties? The present contribution aims to address those questions combining functional-typological and dialectometric perspectives. We first select data for 28 Germanic varieties showing vastly different speaker numbers, grades of standardisation and amounts of language contact. Based on 48 (micro)typological variables from syntax, morphology and phonology, we perform cluster analysis and multidimensional scaling and present ways of visualizing and interpreting the results. Inter alia, the analyses show a major divide between Continental West Germanic and North Germanic (as might be expected) and they also identify a number of outliers, including English and pidgin and creole languages such as Russenorsk or Rabaul Creole German. Afrikaans appears to cluster with the other West Germanic languages rather than the outliers. Within West Germanic, however, it does indeed emerge as rather deviant and, according to our metric, it is, for example, typologically closer to other high-contact varieties such as Yiddish than it is to Dutch.
Less than one percent of words would be affected by gender-inclusive language in German press texts
(2024)
Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.
This paper focuses on language change based on shifting social norms, in particular with regard to the debate on language and gender. It is a recurring argument in this debate that language develops "naturally" and that "severe interventions" - such as gender-inclusive language is often claimed to be - in the allegedly "organic" language system are inappropriate and even "dangerous". Such interventions are, however, not unprecedented. Socially motivated processes of language change are neither unusual nor new. We focus in our contribution on one important political-social space in Germany, the German Bundestag. Taking other struggles about language and gender in the plenaries of the Bundestag as a starting point, our article illustrates that language and gender has been a recurring issue in the German Bundestag since the 1980s. We demonstrate how this is reflected in linguistic practices of the Bundestag, by the use of a) designations for gays and lesbians; b) pair forms such as Bürgerinnen und Bürger (female and male citizens); and c) female forms of addresses and personal nouns ('Präsidentin' in addition to 'Präsident'). Lastly, we will discuss implications of these earlier language battles for the currently very heated debate about gender-inclusive language, especially regarding new forms with gender symbols like the asterisk or the colon (Lehrer*innen, Lehrer:innen; male*female teachers) which are intended to encompass all gender identities.
‘Can’ and ‘must’-type modal verbs in the direct sanctioning of misconduct across European languages
(2023)
Deontic meanings of obligation and permissibility have mostly been studied in relation to modal verbs, even though researchers are aware that such meanings can be conveyed in other ways (consider, for example, the contributions to Nuyts/van der Auwera (eds.) 2016). This presentation reports on an ongoing project that examines deontic meaning but takes as its starting point not a type of linguistic structure but a particular kind of social moment that presumably attracts deontic talk: The management of potentially ‚unacceptable‘ or untoward actions (taking the last bread roll at breakfast, making a disallowed move during a board game, etc.). Data come from a multi-language parallel video corpus of everyday social interaction in English, German, Italian, and Polish. Here, we focus on moments in which one person sanctions another’s behavior as unacceptable. Using interactional-linguistic methods (Couper-Kuhlen/Selting 2018), we examine similarities and differences across these four languages in the use of modal verbs as part of such sanctioning attempts. First results suggest that modal verbs are not as common in the sanctioning of misconduct as one might expect. Across the four languages, only between 10%–20% of relevant sequences involve a modal verb. Most of the time, in this context, speakers achieve deontic meaning in other ways (e.g., infinitives such as German nicht so schmatzen, ‚no smacking‘). This raises the question what exactly modal verbs, on those relatively rare occasions when they are used, contribute to the accomplishment of deontic meaning. The reported study pursues this question in two ways: 1) By considering similarities across languages in the ways that modal verbs interact with other (verbal) means in the sanctioning of misconduct.; 2) By considering differences across languages in the use of modal verbs. Here, we find that the relevant modal verbs are used similarly in some activity contexts (enforcing rules during board games), but less so in other activity contexts (mundane situations with no codified rules). In sum, the presented study adds to cross-linguistically grounded knowledge about deontic meaning and its relationships to linguistics structures.
It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
Our everyday lives in any social community are shaped by rules (e.g., Roughley 2019; Schmidt/Rakoczy 2019). Rules (in a broad sense) are interactionally negotiated, monitored, enforced, and serve as an ‘orientation value‘ in social life. If someone‘s behavior is treated as norm-violating or problematic in certain way, it may be therefore confronted. Confronting interlocutors can immediately stop, modify, or retrospectively reprimand the misconduct of others in a moralizing manner. Such confrontations of a problem behavior occur commonly in informal interactions. On the basis of our corpus, specifically in informal interactions at the table, I observed that, for example, in Polish, German and British English, direct confrontations occur on average at least once every three minutes. Participants design these actions in a variety of ways, but like everything in interaction, the design is not arbitrary (Sacks 1984; Enfield/Sidnell 2019). A recurrent feature of such turns is connecting misconduct to some more general concepts. It is evident from the data that e.g. speakers of German and Polish use ‘generally valid statements’ in problematic moments (cf. Küttner/Vatanen/Zinken 2022) to reach the closure of the problem sequence, also specifically dealing there with distribution of deontic and epistemic rights (Rogowska in prep.). I ask, when and for what purpose generality, that is, abstracting from a concrete behaviour, is used as a tool while confronting others. The focus is on sequential and linguistic features of abstracting in confronting moments in language comparison. What are the methods to achieve abstraction: i) defocusing the confronted, specific agent (cf. Zinken et al. 2021; Siewierska 2008), e.g. nur derjenige der dran ist der darf die bedingungen für den handel stellen (only the one whose turn it is may set the conditions for the trade); using ii) extreme case formulations (Pomerantz 1986), e.g. na siostrę zawsze można liczyć (you can always count on a sister); iii) referring to stable character traits, e.g. Matylda bardzo chetne by podala. (.) Ona jest taka skora do pomocy (Matylda would be very happy to pass (it to you). (.) She is so eager to help); or iv) broader categorizing of the given referent, e.g. do not build (.) do do not build do not build swastikas (when a) German guy is filming us? Sometimes, even several locus of abstraction are combined in the same turn. Can we identify language-specific and cross-linguistic patterns? What are the interactional consequences: enforcing a compliant behavior in the future, eliciting an apology or cognitively simplifying complex problems? From a comparative perspective, I ask whether going beyond the here-and-now while confronting others is a practice that unites speakers across languages and is thus a human cognitive strategy to display normativity. This ongoing study is based on new comparable data from four European languages from informal interaction during activities around the table (Kornfeld/Küttner/Zinken 2023; Küttner et al. in prep.). The phenomenon was coded systematically in each of the four languages as part of a larger, quantitatively oriented study with different questions (Küttner et al. submitted). In the talk, I will show exemplarily Polish and German evidence. I use the methods of Conversation Analysis (Sidnell/Stivers (eds.) 2012) and Interactional Linguistics (Imo/Lanwer 2019).
It is a ubiquitous phenomenon of everyday interaction that participants confront their co-participants for behaviour that they assess as undesirable or in some other way untoward. In a set of video data of informal interaction from the PECII corpus (Parallel European Corpus of Informal Interaction), cases of such sanctions have been collected in English, German, Italian and Polish data. This study presents work in progress and focuses on interrogatively formatted sanctions, in particular on non-polar interrogatives. It has already been shown that interrogatives can do much more than ask questions (Huddleston 1994). They can also function as directives (Lindström et al. 2017) or, more specifically, as requests (Curl/Drew 2008), as invitations (Margutti/Galatolo 2018) or reproaches (Klattenberg 2021), among others. What makes them interesting for cross-linguistic comparison is that the four languages that are considered provide different morphological and (morpho-)syntactical ressources for the realization of interrogative phrases. For example, German provides the option of building in the modal particle denn that reveals a previous lack of clarity and obliges the co-participant(s) to deliver the missing information (Deppermann 2009). Of course, the other three languages have modal particles, too (e.g. allora in Italian or though in English), but they do not seem to convey the same semantic and interactional qualities as denn. From an interactional point of view, one could think that interrogatives are a typical and effective way of solliciting accounts, since formally they open up a conditionally relevant space for an answer or a
reaction. But as the data shows, this does not guarantee that they are actually responded to. Another relevant aspect in the context of sanctions is that the interrogative format seems to carry a certain ‚openness‘ that might be seen as a mitigating effect and thus provides an interesting point of comparison with other mitigating devices. This study uses the methods of conversation analysis and interactional linguistics. It is based on a collection of 148 interrogative sanctions (out of which 84 are non-polar interrogatives) covering the four languages. I draw on coded data from roughly 1000 cases to get a first overall idea of how the interrogative format might differ from other formats, and how it might interrelate with specific features – for example, if subsequently an account is delivered. Going more into depth, the interrogative sanctions will then be analyzed with respect to their formal design (e.g. polar questions vs. content questions vs. tag questions, Rossano 2010; Hayano 2013) and to their pragmatic implications. I also analyze reactions to such sanctions – both formally (cf. Enfield et al. 2019, 279) and, again, from an interactional perspective (e.g. acceptance/compliance vs. challenging/defiance; Kent 2012; Cekaite 2020). A more detailed zooming in on the sequential unfolding of some particularly interesting
instances of sanctioning interrogatives will make the picture complete.
Contrastive analysis of climate-related neologisms registered in GermanN and French Wikipedia
(2023)
Neologisms represent new social norms, tendencies, controversies and attitudes. They denote new or changed concepts which are constantly being negotiated between different members of the discourse community (Wodak 2022 and Catalano/Waugh (eds.) 2020). Neologisms help to identify new communicative patterns and narratives which illustrate different strings of discourse in everyday life. In recent years, many neologisms relating to the subject of the environment and climate have been emerging around the world mainly due to dominant discussions on climate change and the movement “Fridays for Future”. In German, for example, neologisms such as Klimakleber, klimaresilient and globaler Streik and in French neologisms such as éco-anxiété, justice climatique and écocitoyen could be observed. These neologisms occur in many domains of life, for example in politics, media and also in advertising, which means that “l’importance croissante des enjeux environnementaux dans les discours politiques, médiatiques et publicitaires” (Balnat/Gérard 2022, p. 22) can be identified. However, it is not only the occurrence of environment- or climate-related topics that is increasing, but also the rising polarisation of the public debate. The polarisation within public discourse is based on the fact that there are opposing positions which are represented by new or recently relevant terms such as activistes du climat (or Klimaaktivisten) and climatosceptiques (or Klimaskeptiker) (Balnat/Gérard 2022, p. 22). Due to different identifications with one or the other side, one can also speak of an “affrontement idéologique” (Balnat/Gérard 2022, p. 23). 1 The explosive nature and the high complexity of the debate on climate and the environmental issues mean that many words are naturally unfamiliar to people. This is especially true with regard to neologisms. In addition, it is often not only the new word itself but also the signified concept that is initially unknown. When people then look up words, they often do so on the Internet. Wikipedia as a “free encyclopedia” (Wikipedia 2023) is particularly well suited as an object of study with regard to neologisms, since factual knowledge is given special attention there. Furthermore, this reference guide is perceived as a regular source of agreed and common knowledge on all sorts of subjects. Hence, the descriptions found here represent social agreement on controversial terms and discussions to some degree. In this paper, German and French neologisms from the subject area of climate and environment will be examined primarily in Wikipedia, but also in the neighbouring resource Wiktionary,2 which is “a collaborative project to produce a free-content multilingual dictionary” (Wiktionary 2023). Since Wikipedia and Wiktionary are available in French and in German, 21010. International Contrastive Linguistics Conference (ICLC) both are equally suitable for the contrastive analysis. Thus, Wikipedia articles which are accessible in both languages (e.g. Klimanotstand and État d›urgence climatique) or Wikipedia articles about similar events and phenomena (e.g. Letzte Generation and Dernière Rénovation) will be compared. For example, we will have a closer look at other new terms specifying different thematic aspects of the discourse of climate and environment. We will mainly refer to those lexical items which can be found in the respective articles in both languages. Special emphasis will be on overlaps and differences, thematic foci, speaker’s positions and evaluative terms.
A central goal of linguistics is to understand the diverse ways in which human language can be organized (Gibson et al. 2019; Lupyan/Dale 2016). In our contribution, we present results of a large scale cross-linguistic analysis of the statistical structure of written language (Koplenig/Wolfer/Meyer 2023) we approach this question from an information-theoretic perspective. To this end, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections, so-called corpora, consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of un. To this end, we have trained a language model on more than 6,500 different documents as represented in 41 parallel/multilingual corpora consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Figure 1 shows that our database covers a large variety of different text types, e.g. religious texts, legalese texts, subtitles for various movies and talks, newspaper texts, web crawls, Wikipedia articles, or translated example sentences from a free collaborative online database. Furthermore, we use word frequency information from the Crúbadán project that aims at creating text corpora for a large number of (especially under-resourced) languages (Scannell 2007). We statistically infer the entropy rate of each language model as an information-theoretic index of (un)predictability/complexity (Schürmann/Grassberger 1996; Takahira/Tanaka-Ishii/Dębowski 2016). Equipped with this database and information-theoretic estimation framework, we first evaluate the so-called ‘equi-complexity hypothesis’, the idea that all languages are equally complex (Sampson 2009). We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. This constitutes evidence against the equi-complexity hypothesis from an information-theoretic perspective. We then present, discuss and evaluate evidence for a complexity-efficiency trade-off that unexpectedly emerged when we analysed our database: high-entropy languages tend to need fewer symbols to encode messages and vice versa. Given that, from an information theoretic point of view, the message length quantifies efficiency – the shorter the encoded message the higher the efficiency (Gibson et al. 2019) – this indicates that human languages trade off efficiency against complexity. More explicitly, a higher average amount of choice/uncertainty per produced/received symbol is compensated by a shorter average message length. Finally, we present results that could point toward the idea that the absolute amount of information in parallel texts is invariant across different languages.
In G, E, I, and H there are constructions with accusative NPs being the external argument of an infinitival, (1) to (4). In P these accusative NPs can only co-occur with an adjectival participle, (5), a construction also occurring in E, (6). The talk compares the syntactic and semantic structure of these constructions focussing on the syntactic category of the nonfinite clause, the status of the accusative NP, the status of the infinitive, restructuring effects, and embedding predicates (including aspect).
i. As to G, E, I, and H, the infinitival clause is regarded as a TP, i.e., a small clause. Its accusative NP and infinitival predicate form a unit – [4], [12], [8]. The AcI denotes, according to [4], an eventuality, which prevents it from being negated. Its subject is case marked by the matrix predicate, either by ECM or subject-to-object raising – [9] and [10]. AcI-constructions can show clause union effects, (7). H additionally allows Dative subjects in infinitive clauses, the latter only being licensed by impersonal predicates and co-occurring with an agreeing infinitive, (8a), – [3]. In case there is no agreeing infinitive, the Dative NP is the experiencer of the matrix clause, (8b). As for Italian, it allows Nominative subject NPs in the infinitive clause, (9a, b).
ii. As to P, small clause constructions differ structurally from E, G, I and H ones – [6], [7]. P small clauses are realizable by copula constructions with verbal być ‘be’ pronominal to ‘it’, (10), or “dual” copula elements, (cooccurrence of a pronominal and a verbal element, [1]), varying with respect to selectional restrictions (part of speech or case within complement phrases, extraction possibilities, [1]). The P counterpart to the AcI-constructions is the secondary predication over an accusative object via an adjectival present participle, (5), (11) and (12). The adjectival participle construction is systematically paraphrasable via clauses introduced by jak ‘how’ (11’) and (12’). In Polish, adjectival phrases like recytującego wiersz ‘reciting’, (11), and wracającego z podróży ‘returning’, (12), clearly function as adjuncts of the accusative object go ‘him’. In our talk, we will compare this P view to languages with typical AcI-constructions, where the AcI-clause is standardly analyzed as a complement of a matrix verb.
Interactants who encounter co-participant conduct which they find to be socio-normatively problematic or troublesome are faced with a range of choices. First and foremost, this includes the issue of whether to directly address it, or to simply ‘let it pass’ (at least for now) (Emerson/Messinger 1977). In the case of the former, the issue then becomes how to address it. Across the various ways in which participants can pragmatically engage with what they perceive to be transgressive or untoward behavior (e.g., Pomerantz 1978; Schegloff 1988b; Dersley/Wootton 2000; Günthner 2000; Bolden/Robinson 2011; Potter/Hepburn 2020; see also Rodriguez 2022), they sometimes meta-pragmatically formulate the co-participant’s doings in terms of specific actions. Such action descriptions are necessarily selective (Sacks 1963; Schegloff 1972, 1988a; Sidnell/Barnes 2013): They foreground certain aspects of the co-participant’s conduct, while backgrounding others, and thus contribute to publically construeing the formulated conduct in particular ways (Jayyusi 1993), viz. as socio-normatively problematic, transgressive or untoward, and interactionally accountable (Robinson 2016; Sidnell 2017).
This conversation analytic study compares the use of negation particles in spoken German and Persian, namely nein/nee and na. While these particles have a range of functions in both languages (Ghaderi 2022; Imo 2017), their use in response to news remains understudied. We focus on nein/nee and na in two sequential contexts: (i) after prior disconfirmations (Extract (a)) and (ii) in response to either solicited or unsolicited informings (see Extracts (b) and (c), respectively). In both contexts, nein/nee and na mark unexpectedness and open up an opportunity space for more, but they do so in different ways and with different outcomes. Nein/nee- and na-turns after disconfirming, often minimal responses to first-position confirmable turns mark the prior as unexpected (or even contrasting with the nein/nee/na-speaker’s expectations) and thus as expandable/accountable (cf. Ford 2001; Gubina/Betz 2021). Nein/nee/na-turns after informings (e.g., announcements that display a story teller’s negative emotional stance) differ not only in sequential position but also in prosodic realization. They can be either falling or rising, but all are characterized by marked prosody, i.e., lengthening, very low onset, smiling or breathy voice, or high overall pitch. Through position and turn design features, such nein/nee- and na-turns not only mark a prior turn as counter to (normative) expectations, but may also display the speaker’s affective stance and affiliate with the affective stance of the prior interactant. By comparing the use of nein/nee and na in German and Persian in the two functions illustrated in Extracts (a) and (b/c), we will show (i) how nein/nee- and na-turns shape interactional trajectories after responsive actions and (ii) what role the particles play in managing news and stance-taking as well as epistemic and affective positioning. Apart from revealing similarities in the use of German and Persian negation particles, the results of our crosslinguistic comparison will demonstrate that even if different languages have similar practices for specific actions, the use of these practices is language- and culture-specific. This means that even similar practices in different languages have their own “collateral effects” (Sidnell/Enfield 2012), linguistic and prosodic characteristic features, and, at least sometimes, consequences for social actions accomplished in the specific language (e.g., Dingemanse/Blythe/Dirksmeyer 2014; Evans/Levinson 2009; Floyd/Rossi/Enfield (eds.) 2020; Fox et al. 2009). Our study uses the method of Conversation Analysis (Sidnell/Stivers (eds.) 2013) and draws on more than 80 hours of audio and video recordings of spontaneous interactions (co-present, via video link, and on the telephone) in everyday and institutional contexts.
The issue: We discuss (declarative) prepositional object clauses (PO-clauses) in the West Germanic languages Dutch (NL), German (DE), and English (EN). In Dutch and German, PO-clauses occur with a prepositional proform (=PPF, Dutch: ervan, erover, etc.; German: drauf/darauf, drüber/darüber, etc.). This proform is optional with some verbs (1). In English, by contrast, P embeds a clausal complement in the case of gerunds or indirect questions (2), however, P is obligatorily absent when the embedded CP is a that-clause in its base positionv(3a). However, when the that-clause is passivized or topicalized, the stranded P is obligatory (3b). Given this scenario, we will address the following questions: i) Are there structural differences between PO-clauses with a P/PPF and those in which the P/PPF is optionally or obligatorily omitted? ii) In particular, do PO-clauses without P/PPF structurally coincide with direct object (=DO) clauses? iii) To what extent are case and nominal properties of clauses relevant? We use wh-extraction as a relevant test for such differences.
Previous research: Based on pronominalization and topicalization data in German and Dutch, PO-clauses are different from DO-clauses independent of the presence of the PPF (see, e.g., Breindl 1989; Zifonun/Hoffmann/Strecker 1997; Berman 2003; Broekhuis/Corver 2015 and references therein) (4,5). English pronominalization and topicalization data (3b) appear to point in the same direction (Fischer 1997; Berman 2003; Delicado Cantero 2013). However, the obligatory absence of P before that-clauses in base position indicates a convergence with DO-clauses.
Experimental evidence: To provide further evidence to these questions we tested PO-clauses in all three languages for long wh-extraction, which is usually possible for DO-clauses in English and Dutch, and in German for southern regional varieties. For German and Dutch we conducted rating studies using the thermometer method (Featherston 2008). Each study contained two sets of sentences: the first set tested long wh-extraction with regular DO-clauses (6). The second set tested wh-extraction from PO-clauses with and without PPFs (7), respectively. The results show no significant difference in extraction with PO-clauses whether or not the PPF was present even for those speakers who otherwise accept long-distance extraction in German. This supports a uniform analysis of PO-clauses with and without the PPF in contrast to DO-clauses. For English we tested extraction with verbs that select for PP-objects in two configurations: V+that-clause and V+P-gerund (8) in comparison to sentences without extraction. Participants rated sentences on a scale of 1 (unnatural) to 7 (natural). We included the gerund for English as this is a regular alternative for such objects. The results show that extraction is licit in both configurations. This suggests that English PO-clauses are different from German and Dutch PO-clauses: They rather behave as DO-clauses allowing for extraction. Note though, that the availability of extraction from P+gerund also shows that PPs are not islands for extraction in English. Overall, this shows that there is a split between English vs. German/Dutch PO-clauses when the P/PPF is absent. While these clauses behave like PO-clauses in the latter languages, extraction does not show a difference between DO- and PO-clauses in English. We will discuss the results in relation to the questions i)–iii) above.
Any bilingual dictionary is contrastive by nature, as it documents linguistic information between language pairs. However, the design and compilation of most bilingual dictionaries is often no more than mere lists of lexical or semantic equivalents. In internet forums, one can observe a huge interest in acquiring relevant knowledge about specific lexical items or pairs that are prone to comparison in a more comprehensive manner as they may pose lexical semantic challenges. In particular, these often concern easily confused pairs (e.g. false friends or paronyms) and new terms increasingly travelling between languages in news and social media (Šetka-Čilić/Ilić Plauc 2021). With regard to English and German, the fundamental comparative principles upon which contrastive guides should be build are either absent, or specialised contrastive dictionaries simply do not exist, e.g. comprehensive descriptive resources for false friends, paronyms, protologisms or neologisms (see Gouws/Prinsloo/de Schryver 2004). As a result, users turn to electronic resources such as Google translate, blogs and language forums for help. For example, it is English words such as muscular which have two German translations options.
These are two confusables muskulär and muskulös both of which exhibit a different semantic profile. German sensitiv/sensibel and their English formal counterparts sensitive/sensible are false friends. However, these terms are highly polysemous in both languages and have semantic features in common. Their full meaning spectrum is hardly captured in bilingual dictionaries to allow for a full comparison. Translating protologisms such as German Doppelwumms as well as more established new words is one of the most challenging problems. Currently, German neologisms such as Klimakleber are translated as climate glue (instead of climate activist glueing him-/herself onto objects) by online tools, simply causing mistakes and contextual distortion. Most challenges users face today are well-known (e.g. Rets 2016). New terms are often unregistered in dictionaries and it is often impossible to make appropriate choices between two or more (commonly misused) words between two languages (e.g. Benzehra 2007). These are all relevant problems to translators and language learners alike (e.g González Ribao 2019).
This paper calls for the implication of insights from contrastive lexicology into modern bilingual lexicography. To turn dictionaries into valuable resources and in order to create productive strategies in a learning environment, the practice of writing dictionaries requires a critical re-assessment. Furthermore, the full potential of electronic contrastive resources needs to be recognised and put into practice. After all, monolingual German lexicography has started to reflect on how users’ needs can be accounted for in specific comparative linguistic situations. Some of these ideas can be comfortably extended to bilingual reference guides. On the one hand, this paper will deliver a critical account of some English-German/German-English dictionaries and touch on the shortcomings of contemporary bilingual lexicography. On the other hand, with the help of fictitious resources I will demonstrate contrastive structures as focal points of consultations which answer some of the more frequent language questions more reliably. Among others, I will explain how we need to build user-friendly dictionaries to allow for translating false friends or easily confusable words from the source language into its target language efficiently. With regard to neologisms, I will show how discursive descriptions and definitions that are more elaborate can support language learners to learn about necessary extra-linguistic knowledge. Overall, this could improve the role of specialised dictionaries in the teaching or translating process (cf. Miliç/Sadri/Glušac 2019).
The International Comparable Corpus (ICC) (Kirk/Čermáková 2017; Čermáková et al. 2021) is an open initiative which aims to improve the empirical basis for contrastive linguistics by compiling comparable corpora for many languages and making them as freely available as possible as well as providing tools with which they can easily be queried and analysed. In this contribution we present the first release of written language parts of the ICC which includes corpora for Chinese, Czech, English, German, Irish (partly), and Norwegian. Each of the released corpora contains 400k words distributed over 14 different text categories according to the ICC specifications. Our poster covers the design basics of the ICC, its TEI encoding, a demonstration of using the ICC via different query tools, and an outlook on future plans.
Similar to the European Reference Corpus EuReCo (Kupietz et al. 2020), ICC follows the approach of reusing existing linguistic resources wherever possible in order to cover as many languages as possible with realistic effort in as short a time as possible. In contrast to EuReCo, however, comparable corpus pairs are not defined dynamically in the usage phase, but the compositions of the corpora are fixed in the ICC design. The approaches are thus complementary in this respect. The design principles and composition of the ICC are based on those of the International Corpus of English (ICE) (Greenbaum (ed.) 1996), with the deviation that the ICC includes the additional text category blog post and excludes spoken legal texts (see Čermáková et al. 2021 for details). ICC’s fixed-design approach has the advantage that all single-language corpora in the ICC have the same composition with respect to the selected text types and that this guarantees that the selected broad spectrum of potential influencing variables for linguistic variation is always represented. The disadvantage, however, is that this can only be achieved for quite small corpora and that the generalisability of comparative findings based on the ICC corpora will often need to be checked on larger monolingual corpora or translation corpora (Čermáková/Ebeling/Oksefjell Ebeling forthcoming). Arguing that such issues with comparability and representativeness are inevitable, in one way or the other, and need to be dealt with, our poster will discuss and exemplify the text selections in more detail.
In this presentation I show first results from an ongoing study about syntactic complexity of sanctioning turns in spoken language. This study is part of a larger project on sanctioning of misconduct in social interaction in different European languages (English, German, Italian and Polish). For the study I use video recordings of different everyday settings (family breakfasts, board game interactions and car rides) with three or four participants. These data come from the Parallel European Corpus of Informal Interaction (Kornfeld/Küttner/Zinken 2023; Küttner et al. submitted). I focus on sanctioning turns with more than one turn-constructional unit (see among others for TCUs: Sacks/Schegloff/Jefferson 1974; Clayman 2013). The study asks how often TCUs are linked to each other in the different languages, for what function, and how language diversity enters into this. Note that complex sanctioning turns do not always come as complex sentences.
Ways out of the dictionary: hyperlinks to other sources in German and African online dictionaries
(2023)
This study examines a number of German and African online dictionaries to see how they make use of the possibility of linking to external sources (e.g. other dictionaries, encyclopaedias, or even corpus data). The article investigates which hyperlinks occur at which places in the word articles and how these are presented to the dictionary users. This is done against the background of metalexicographic considerations on the planning of outer features and the mediostructure in online dictionaries as well as different categorizations of hyperlinks in online reference works. The results show that retro-digitized dictionaries make virtually no use of hyperlinks to external sources. Genuine online dictionaries, on the other hand, do, but often in a form that needs improvement, since, for example, explanations of dictionary-external links are not always found in the user guide and their design is different even within a dictionary.
From June 26th to July 2nd 2023 the International Conference on Conversation Analysis (ICCA) took place in Brisbane/Meanjin, Australia – after a long pause due to the Covid-pandemic and for the first time in the southern hemisphere. About 350 participants from about 50 different countries attended the conference. This year’s ICCA came up with 36 panels and about 300 papers that were presented. Four plenary speakers have been invited and 24 pre-conference workshops took place. On Wednesday evening Ilana Mushin, in her role as conference chair, officially opened ICCA. The President of the International Society of Conversation Analysis (ISCA), Tanya Stivers, also welcomed all participants. To get acquainted with the indigenous culture of Queensland, the opening ceremony was enriched with a highly impressive dance performance by First Nations people. After the official inauguration the international community met at the Welcome Reception to look forward together to the days ahead with many opportunities for exchange and networking.
As it will become clear throughout this report, the research topics revolved around not only classic CA concepts, but also importantly concerned embodiment, which continued the line of past conferences (Dix 2019). Another aspect that has been highlighted was conflict and social norms. Due to personal capacities, we can only present a selection of presentations within the scope of this conference report. The selection was influenced by the personal interest of the authors and should not be understood as rating in any sense.
In many countries of the world, perspectives on gender equality and racism have changed in recent decades. One result has been more attention being devoted to traces of androcentric and racist language in society. This also affects dictionaries. In lexicography there are discussions about whether or to what extent social asymmetries are inscribed in dictionaries and if this is still acceptable. The issue of the nature of description plays an important role in this discussion. If sexist usages are often found in language use, i.e. in the corpus data on which the dictionary is based, does the dictionary also have to show them? How is this, in turn, compatible with the normative power of dictionaries? Do dictionaries contribute to the perpetuation of gender stereotypes by showcasing them under the banner of descriptive principles? And what roles do lexicographers play in this process? The article deals with these questions on the basis of individual lexicographical examples and current discussions in the lexicographic and public community.
Introducing Interactive Grammar: How to Develop Language Competence with Research-based Learning
(2023)
We present the implementation of an interactive e-learning platform for both classroom study and self-study, that helps developing German language competence – vocabulary, spelling, and grammar – on various levels and for everyday life applications. The LernGrammis portal addresses school and highschool students, (prospective) teachers, and L2 learners of German equally, each with appropriate educational content and interactive components. It thus offers the digital networking infrastructure for education a unique, freely available and scientifically based learning resource. Applying the innovative concept of „Research-based Learning (RBL)“, LernGrammis provides teachers with ideas for lesson planning, and learners with dedicated modules to develop new skills through exploring authentic language resources and by this means answering customised low-threshold research questions. Using proven practical examples, we demonstrate the approach, its strengths and possibilities, as well as initial user feedback evaluation results.
This study investigates other-initiated repair and its embodied dimension in casual English as lingua franca (ELF) conversations, thereby contributing to the further understanding of multimodal repair practices in social interaction. Using multimodal conversation analysis, we focus on two types of restricted other-initiation of repair (OIR): partial repeats preceded or followed by the question word what (i.e., what X?/X what?) and copular interrogative clauses (i.e., what is X). Partial repeats with what produced with rising final intonation are consistently accompanied by a head poke and treated as relating to troubles in hearing, with the repair usually consisting of a repeat. In contrast to these partial repeats, copular interrogative clauses are produced with downward final intonation and accompanied by face-related embodied conduct. The what is X OIRs primarily target code-switched lexical items, the understanding of which is critical for maintaining the repair initiator’s involvement in the ongoing sequence. This study also contributes some general reflections on the possible complexity of OIR and repair practices from a multimodal perspective.
Pseudo-coordinated sitzen and stehen in spoken German: a case of emergent progressive aspect?
(2023)
This paper investigates the aspectual potential of posture verb pseudocoordination in spoken German. In a corpus study of sitzen ‘sit’ and stehen ‘stand’, it is shown that despite a preference for activity verbs, verbs of all aspectual classes occur in the second conjunct. The posture verb imposes its durative meaning component on the second verb, thus making a progressive interpretation of the construction possible. Apart from this emergent aspectual function, German posture verb pseudocoordination has a subjective function (conveying the speaker’s beliefs about the subject referent’s stance), and a discourse pragmatic function (information packaging).
In social interaction, different kinds of word-meaning can become problematic for participants. This study analyzes two meta-semantic practices, definitions and specifications, which are used in response to clarification requests in German implemented by the format Was heißt X (‘What does X mean?’). In the data studied, definitions are used to convey generalizable lexical meanings of mostly technical terms. These terms are either unknown to requesters, or, in pedagogical contexts, requesters ask in order to check the addressee’s knowledge. Specifications, in contrast, clarify aspects of local speaker meanings of ordinary expressions (e.g., reference, participants in an event, standards applied to scalar expressions). Both definitions and specifications are recipient-designed with respect to the (presumed) knowledge of the addressee and tailored to the topical and practical relevancies of the current interaction. Both practices attest to the flexibility and situatedness of speakers’ semantic understandings and to the systematicity of using meta-semantic practices differentially for different kinds of semantic problems. Data are come from mundane and institutional interaction in German from the public corpus FOLK.
The NottDeuYTSch corpus is a freely available collection of YouTube comments written under German-speaking videos by young people between 2008 and 2018. The article uses the NottDeuYTSch corpus to investigate how YouTube comments can be used to produce learning materials and how corpora of Digitally-Mediated Communication can benefit intermediate learners of German. The article details the effects of authentic communication within YouTube comments on teenage learners, examining how they can influence the psycholinguistic factors of motivation, foreign language anxiety, and willingness to communicate. The article also discusses the benefits and limitations of using authentic corpus material for the development of teaching material.
This paper introduces the Nottinghamer Korpus deutscher YouTube-Sprache (‘The Nottingham German YouTube Language Corpus’ - or NottDeuYTSch corpus). The corpus comprises over 33 million words, taken from roughly 3 million YouTube comments published between 2008 and 2018, written by a young, German-speaking demographic. The NottDeuYTSch corpus provides an authentic and representative linguistic snapshot of young German speakers and offers significant opportunities for in-depth research in several linguistic fields, such as lexis, morphology, syntax, orthography, multilingualism, and conversational and discursive analysis.
We present a simple tool for extracting text and markup information from printouts of (not only) scientific documents. While the heavy-lifting OCR is done by off-the-shelf tesseract, our focus is on detection, extraction, and basic categorization of color-highlighted text sections, as well as on providing a framework for downstream processing of extraction results. The tool can be useful for document analysis tasks that must, or benefit from being able to, use printed paper.
This study aims to establish what lexical factors make it more likely for dictionary users to consult specific articles in a dictionary using the English Wiktionary log files, which include records of user visits over the course of 6 years. Recent findings suggest that lexical frequency is a significant factor predicting look-up behavior, with the more frequent words being more likely to be consulted. Three further lexical factors are brought into focus: (1) age of acquisition; (2) lexical prevalence; and (3) degree of polysemy operationalized as the number of dictionary senses. Age of acquisition and lexical prevalence data were obtained from recent published studies and linked to the list of visited Wiktionary lemmas, whereas polysemy status was derived from Wiktionary entries themselves. Regression modeling confirms the significance of corpus frequency in explaining user interest in looking up words in the dictionary. However, the remaining three factors also make a contribution whose nature is discussed and interpreted. Knowing what makes dictionary users look up words is both theoretically interesting and practically useful to lexicographers, telling them which lexical items should be prioritized in lexicographic work.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
Despite being an official language of several countries in Central and Western Europe, German is not formally recognised as the official language of the Federal Republic of Germany. However, in certain situations the use of the German language, including the spelling rules, is subject to state regulation (by acts of Federal Parliament orby administrative decisions). This article presents the content of this regulation, its scope, and the historical context in which it was adopted.
Our current era of globalization is characterized above all by increased mobility, namely by the increasing mobility of people and the development of new communication technologies, including the mobility of linguistic signs and resources. This process raises new theoretical and methodological questions in linguistics, which results in the development of a new sociolinguistics of globalization (Blommaert 2010) in recent years. One of the most obvious ways to trace this new and dynamic development is to analyze individual language repertoires, especially those of migrants. In this essay, I examine aspects of the communicative repertoire of a refugee who fled to Germany in 2015 to escape the civil war in Syria. I draw on two interviews I conducted with him (in the following I refer to him by the pseudonym „Baran“). The first interview with Baran was recorded in 2016, a few months after his arrival in Germany. The second interview is from 2023, seven years later. In both recordings, German was the dominant language of interaction. I will analyze and show the characteristics of his German at the beginning of his immigration, how he resorts to practices of language mixing between German, Turkish and English (which has recently also been referred to as translanguaging) and how his German has developed over the course of the past seven years.
This conference booklet provides information about 10th International Contrastive Linguistics Conference (ICLC-10) that took place in Mannheim, Germany, from 18 to 21 July 2023. It contains
– a description of the conference aims,
– details on the conference venue,
– information on committees,
– the conference program,
– the abstracts of the keynotes, oral and poster presentations, and
– an author index.
This manual introduces a conversation analytically informed coding scheme for episodes involving the direct social sanctioning of problem behavior in informal social interaction which was developed in the project Norms, Rules, and Morality across Languages (NoRM-aL) at the Leibniz-Institute for the German Language. It outlines the background for its development, delimits the phenomena to which the coding scheme can be applied and provides instructions for its use.
The scheme asks for basic information about the recording and the participants involved in the episode, before taking stock of different features of the sanctioning episode as a whole. This is followed by sets of specific coding questions about the sanctioning move itself (such as its timing and composition) and the reaction it engenders. The coding enables researchers to get a bird’s eye view on recurrent features of such episodes in larger quantities of data and allows for comparisons across different languages and informal settings.
“Die Sprach-Checker” (Eng. “Language Checkers”) are young citizen scientists from Mannheim’s highly diverse district Neckarstadt-West. Together with linguists, they investigate a tremendous treasure: their own multilingualism. They are exploring and (re)discovering their own languages and the other languages used in their environment while documenting and reflecting on their everyday experiences in and with different linguistic practices. Our aim is to raise awareness of their strengths and to promote appreciation for their language biographies, thus fostering a sense of identification with one’s own linguistic surroundings. Such a joint research endeavour offers empirical opportunities to address (linguistic) issues of societal relevance by collecting authentic data from the multicultural district and involving its residents and local stakeholders. In this paper, we will provide insights regarding the project’s background, conception, and outcomes. We address everyone who is planning or conducting a citizen science project with young people, especially children and adolescents, or who works at the interface between science and society.
This paper examines multi-unit turns that allow speakers to retrospectively close the prior sequence while prospectively launching a new sequence, which Schegloff (1986) referred to as interlocking organization. Using English telephone conversations as data, we focus on how multi-unit turns are used for topic shifts, and show that interlocking organization operates in conjunction with other phonetic and lexical features, such as increased pitch and overt markers of disjunction (e.g., “listen”). In addition, speakers utilize an audible inbreath that is placed between the first and the second units as a central interactional resource to project further talk, thereby suppressing speaker transition and possibly highlighting the action delivered in the second unit as being distinctly new. We propose that interlocking multi-unit turns, when used to make topically disjunctive moves, promote progressivity by avoiding a possible lapse in turn transition
This contribution summarizes the lessons learned from the organization of a joint conference on text analytics research by the Business, Economic, and Related Data (BERD@NFDI) and Text+ consortia within the National Research Data Infrastructure (NFDI) in Germany. The collaboration aimed to identify common ground and foster interdisciplinary dialogue between scholars in the humanities and in the business domain. The lessons learned include the importance of presenting research questions using textual data to establish common ground, similarities in methodology for processing textual data between the consortia, similarities in research data management, and the need for regular interconsortial discussions on textual analysis methods and data. The collaboration proved valuable for interdisciplinary dialogue within the NFDI, and further collaboration between the consortia is planned.
"Reproducibility crisis" and "empirical turn" are only two keywords when it comes to providing reasons for research data management. Research data is omnipresent and with the more and more automatic data processing procedures, they become even more important. However, just because new methods require data and produce data, this does not mean that data are easily accessible, reusable or even make a difference in the CV of a researcher, even if a large portion of research goes into data creation, acquisition, preparation, and analysis. In this talk I will present where we find data in the research process, where we may find appropriate support for data management and advocate for a procedure for including it in research publications and resumes.
This presentation relies on work within the BMBF-funded project CLARIN-D. It also builds on work within the German National Research Data Infrastructure (NFDI) consortium Text+, DFG project number 460033370.
Prediction is a central mechanism in the human language processing architecture. The psycholinguistic and neurolinguistic literature has seen a lively debate about what form prediction may take and what status it has for language processing in the human mind and brain. While predictions are a ubiquitous finding, the implications of these results for models of language processing differ. For instance, eyetracking data suggest that predictions may rely on sublexical orthographic information in natural reading, while electrophysiological data provide mixed evidence for form-based predictions during reading. Other research has revealed that humans rapidly adapt to text specifics and that their predictive capacity varies, broadly speaking, in accordance with inter- and intra-individual language proficiency, which cuts across the speaker groups (e.g. L1 vs. L2 speakers, skilled vs. untrained readers) traditionally used for experimental contrasts. There is therefore evidence that the kind and strength of linguistic predictions depend on (at least) three sources of variability in language processing: speaker, text genre and experimental method.
The aim of this Research Topic is to develop a better understanding of prediction in light of the three sources of variability in language processing, by providing an overview of state-of-the art research on predictive language processing and by bringing together research from various disciplines.
First, intra-and inter-individual differences and their influence on predictive processes remain underrepresented in experimental research on predictive processing. How do language users differ in their predictive abilities and strategies, and how are these differences shaped by e.g. biological, social and cultural factors?
Second, while language users experience great stylistic diversity in their daily language exposure and use, the majority of language processing research still focuses on a very constrained register of well-controlled sentences composed in the standard language. How are predictions shaped by extra- and meta-linguistic context, such as register/genre or accent/speaker identity, and how may this influence the processing of experimental items in another language or text variety?
Third, the Research Topic invites contributions that make use of a multi-method approach, such as combined behavioral and electrophysiological measures or experimental methods combined with measures extracted from corpus data. What opportunities and challenges do we face when integrating multiple approaches to examine linguistic, experimental and individual differences in human predictive capacity?
We welcome contributions from all areas of empirical psycho- and neurolinguistics, but contributions must explicitly address variability and variation in language and language processing. Relevant topics include individual differences and the impact of genre, modality, register and language variety. Contributions that go beyond single word and single sentence paradigms are especially desirable. Experimental, corpus-based, meta-analytic and review papers, as well as theoretical/opinion pieces are welcome; however, papers of the latter type should support their arguments with substantial empirical evidence from the literature. Particularly desirable are contributions which combine topics and/or methods, such as the impact of an individual's native dialect on processing of constructions that show variability in the standard language (e.g. choice of auxiliary, agreement of mass nouns, etc.) or experimental methods combined with measures extracted from corpus data such as information-theoretic surprisal.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.
Recent years have seen a growing interest in grammatical variation, a core explanandum of grammatical theory. The present volume explores questions that are fundamental to this line of research: First, the question of whether variation can always and completely be explained by intra- or extra-linguistic predictors, or whether there is a certain amount of unpredictable – or ‘free’ – grammatical variation. Second, the question of what implications the (in-)existence of free variation would hold for our theoretical models and the empirical study of grammar. The volume provides the first dedicated book-length treatment of this long-standing topic. Following an introductory chapter by the editors, it contains ten case studies on potentially free variation in morphology and syntax drawn from Germanic, Romance, Uralic and Mayan.
Allusion
(2023)
Assessment
(2023)
Most broadly, an assessment is a type of social action by which an interactant expresses an evaluative stance towards someone or something (e.g., an object, an event, an action, an experience, a state of affairs, a place, a circumstance, etc.). The target of an assessment is typically called the ‘assessable’.
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
Retro-sequence
(2023)
The Data Governance Act was proposed in late 2020 as part of the European Strategy for Data, and adopted on 30 May 2022 (as Regulation 2022/868). It will enter into application on 24 September 2023. The Data governance Act is a major development in the legal framework affecting CLARIN and the whole language community. With its new rules on the re-use of data held by the public sector bodies and on the provision of data sharing services, and especially its encouragement of data altruism, the Data Governance Act creates new opportunities and new challenges for CLARIN ERIC. This paper analyses the provisions of the Data Governance Act, and aims at initiating the debate on how they will impact CLARIN and the whole language community.
Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
We present a collection of (currently) about 5.500 commands directed to voice-controlled virtual assistants (VAs) by sixteen initial users of a VA system in their homes. The collection comprises recordings captured by the VA itself and with a conditional voice recorder (CVR) selectively capturing recordings including the VA-directed commands plus some surrounding context. Next to a description of the collection, we present initial findings on the patterns of use of the VA systems during the first weeks after installation, including usage timing, the development of usage frequency, distributions of sentence structures across commands, and (the development of) command success rates. We discuss the advantages and disadvantages of the applied collection-specific recording approach and describe potential research questions that can be investigated in the future, based on the collection, as well as the merit of combining quantitative corpus linguistic approaches with qualitative in-depth analyses of single cases.
This article investigates mundane photo taking practices with personal mobile devices in the co-presence of others, as well as “divergent” self-initiated smartphone use, thereby exploring the impact of everyday technologies on social interaction. Utilizing multimodal conversation analysis, we examined sequences in which young adults take pictures of food and drinks in restaurants and cafés. Although everyday interactions are abundant in opportunities for accomplishing food photography as a side activity, our data show that taking pictures is also often prioritized over other activities. Through a detailed sequential analysis of video recordings and dynamic screen captures of mobile devices, we illustrate how photographers orient to the momentary opportunities for and relevance of photo taking, that is, how they systematically organize their photographing with respect to the ongoing social encounter and the (projected) changes in the material environment. We investigate how the participants multimodally negotiate the “mainness” and “sideness” (Mondada, 2014) of situated food photography and describe some particular features of participants’ conduct in moments of mundane multiactivity.
Developments within the field of Second Language Acquisition (SLA) have meant that scholars are increasingly engaging with corpora and corpus-based resources, providing a source of “‘authentic’ language” to learners and educators (Mitchell 2020: 254), and contributing to “state-of-the-art research methodologies” (Deshors and Gries 2023: 164). However, there are areas in which progress can still be made, particularly in the area of metadata, such as information about the speaker and contexts of the language use, as well as increased variety in the text types and genres of corpora used to develop SLA materials (Paquot 2022: 36). This post discusses one such possibility for increasing the variety of text types and providing a rich source of authentic language that can be used to create engaging SLA materials, particularly for young people learning German, namely the use of the NottDeuYTSch corpus (to download the corpus in a variety of formats, see Cotgrove 2018).
Modular pivot
(2023)
A modular pivot is a type of turn-constructional pivot. It is built from syntactically entirely optional items (i.e. linguistic adjuncts) that can occur in both turn-initial and turn-final position and can therefore be used to patch a wide range of otherwise discrete turn-constructional units (TCUs) together (Clayman & Raymond 2015). A prime example of an item that lends itself to be deployed as a modular pivot are address terms (Clayman 2012).
Pivot
(2023)
The term pivot denotes an element of talk that can be understood to belong to two larger units of talk simultaneously, thereby joining them together and acting as a transitional link between them (Schegloff 1979: 275-276). Most commonly, the term is used to refer to lexico-syntactic elements that can be interpreted as ending one turn-constructional unit (TCU) while at the same time launching a next.
The Encyclopedia of Terminology for Conversation Analysis and Interactional Linguistics is an online resource for students and scholars of CA/IL, publicly available on the EMCA Wiki page. Encyclopedias and glossaries are widespread across various fields and methods, and serve as immensely valuable resources. Given the extent to which the EMCA/IL community has expanded over the years—both terminologically as well as geographically—we hope that this encyclopedia of terminology will be well received by students and practitioners of CA and IL across the globe.
This paper presents an extended annotation and analysis of interpretative reply relations focusing on a comparison of reply relation types and targets between conflictual pages and neutral pages of German Wikipedia (WP) talk pages. We briefly present the different categories identified for interpretative reply relations to analyze the relationship between WP postings as well as linguistic cues for each category. We investigate referencing strategies of WP authors in discussion page postings, illustrated by means of reply relation types and targets taking into account the degree of disagreement displayed on a WP talk page. We provide richly annotated data that can be used for further analyses such as the identification of interactional relations on higher levels, or for training tasks in machine learning algorithms.
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
The proposed contribution will shed light on current and future challenges on legal and ethical questions in research data infrastructures. The authors of the proposal will present the work of NFDI’s section on Ethical, Legal and Social Aspects (hereinafter: ELSA), whose aim is to facilitate cross-disciplinary cooperation between the NFDI consortia in the relevant areas of management and re-use of research data.
Open Science and language data: Expectations vs. reality. The role of research data infrastructures
(2023)
Language data are essential for any scientific endeavor. However, unlike numerical data, language data are often protected by copyright, as they easily meet the threshold of originality. The role of research infrastructures (such CLARIN, DARIAH, and Text+) is to bridge the gap between uses allowed by statutory exceptions and the requirements of Open Science. This is achieved on the one hand by sharing language data produced by research organisations with the widest possible circle of persons, and on the other by mutualizing efforts towards copyright clearance and appropriate licensing of datasets.
This White Paper sets out commonly agreed definitions on activities of consortia within NFDI. It aims to provide a common basis for reporting and reference regarding selected questions of cross-consortial relevance in DFG’s template for the Interim Reports. The questions were prioritised by an NFDI Task Force on Evaluation and Reporting (formerly Task Force Monitoring) as a result of discussing possible answers to the DFG template. In this process the need to agree on a generalizable meaning of terms commonly used in the context of NFDI, and reporting in particular, were identified from cross-consortial perspectives. Questions that showed the highest requirement on clarification are discussed in this White Paper. As NFDI evolves, the Task Force will likely propose further joint approaches for reporting in information infrastructures.
While each of broad relevance, the questions addressed relate to substantially different aspects of consortia’s work. They are thus also structured slightly different.
This paper analyses intensification in German digitally-mediated communication (DMC) using a corpus of YouTube comments written by young people (the NottDeuYTSch corpus). Research on intensification in written language has traditionally focused on two grammatical aspects: syntactic intensification, i.e. the use of particles and other lexical items and morphological intensification, i.e. the use of compounding. Using a wide variety og examples from the corpus, the paper identifies novel ways that have been used for intensification in DMC, and suggests a new taxonomy of classification for future analysis of intensification.
National Socialism, one could argue, was all about belonging: belonging to the ‘Volk’ or the ‘Volksgemeinschaft’, belonging to the ‘Aryan’ or ‘Non-Aryan race’, belonging to the National Socialist ‘movement’, and so on. These categories of belonging worked both inclusionary and exclusionary and they were constituted, proclaimed and enacted to a great part through language. What is more, they had to be performed through communicative acts. For the normative side of National Socialist propaganda and legislation, this seems rather obvious and one-directional. On the side of the general population, however, this entailed a mixture of communicative need to position oneself vis-à-vis National Socialism (mostly in affirmative ways), but also the urge to do so willingly. When we look at the language use of ‘ordinary people’ in different communicative situations and texts during National Socialism, we have to focus on these dimensions of discursive collusion, co-constitution and appropriation. People during National Socialism, such is our hypothesis, navigated through discourses of belonging and by that made them real and effective. Besides diaries, war letters and autobiographical writings, one way to grasp this phenomenon is to analyse petitions, i.e., letters of complaint and request sent in large numbers by ‘ordinary people’ to public authorities of the party and the state. As I will show by some examples, letter-writers tried to inscribe themselves within (what they took for) National Socialist discourses of belonging in order to legitimate their claims. By doing so, they co-constituted and co-created the discursive realm of National Socialism.
One of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of what we call average prediction complexity. We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. In addition, we show that speaker population size predicts entropy. We argue that both results constitute evidence against the equi-complexity hypothesis from an information-theoretic perspective.
In workplace settings, skilled participants cooperate on the basis of shared routines in smooth and often implicit ways. Our study shows how interactional histories provide the basis for routine coordination. We draw on theater rehearsals as a perspicuous setting for tracking interactional histories. In theater rehearsals, the process of building performing routines is in focus. Our study builds on collections of consecutive performances of the same instructional task coming from a corpus of video-recordings of 30 h of theater rehearsals of professional actors in German. Over time, instructions and their implementations are routinely coordinated by virtue of accumulated shared interactional experience: Instructions become shorter, the timing of responses becomes increasingly compacted and long negotiations are reduced to a two-part sequence of instruction and implementation. Overall, a routine of how to perform the scene emerges. Over interactional histories, patterns of projection of next actions emanating from instructions become reliable and can be used by respondents as sources for anticipating and performing relevant next actions. The study contributes to our understanding of how shared knowledge and routines accumulate over shared interactional experiences in publicly performed and reciprocally perceived ways and how this impinges on the efficiency of joint action.
Theater rehearsals are (usually) confronted with the problem of having to transform a written text into an audio-visual, situated and temporal performance. Our contribution focuses on the emergence and stabilization of a gestural form as a solution for embodying a certain aesthetic concept which is derived from the script. This process involves instructions and negotiations, making the process of stabilization publicly and thus intersubjectively accessible. As scenes are repeatedly rehearsed, rehearsals are perspicuous settings for tracking interactional histories. Based on videotaped professional theatre interactions in Germany, we focus on consecutive instances of rehearsing the same scene and trace the interactional history of a particular gesture. This gesture is used by the director to instruct the actors to play a particular aspect of a scene adopting a certain aesthetic concept. Stabilization requires the emergence of shared knowledge. We will show the practices by which shared knowledge is established over time during the rehearsal process and, in turn, how the accumulation of knowledge contributes to a change in the interactional practices themselves. Specifically, we show how a gesture emerges in the process of developing and embodying an aesthetic concept, and how this gesture eventually becomes a sign that refers to and evokes accumulated knowledge. At the same time, we show how this accumulated knowledge changes the instructional activities in the rehearsal process. Our study contributes to the overall understanding of knowledge accumulation in interaction in general and in theater rehearsals in particular. At the same time, it is devoted to the central importance of gestures in theater, which are both a means and a product of theatrical staging.
The workshop presents ATHEN 1 (Annotation and Text Highlighting Environment), an extensible desktop-based annotation environment which supports more than just regular annotation. Besides being a general purpose annotation environment, ATHEN supports indexing and querying support of your data as well as the ability to automatically preprocess your data with Meta information. It is especially suited for those who want to extend existing general purpose annotation tools by implementing their own custom features, which cannot be fulfilled by other available annotation environments. On the according gitlab, we provide online tutorials, which demonstrate the use of specific features of ATHEN
Picnick and Sauerkraut: German–English intra-writer variation in script and language (1867–1900)
(2023)
Intra-writer variation is a wide-spread phenomenon that nevertheless has received only limited research attention so far. Different addressees, bi- and multilingualism, or changing life phases are among the factors that contribute to such variation. In a study of diary entries by one writer covering three decades (1867–1900), this chapter investigates patterns of intra-writer variation between German and English (language and script) in nineteenth-century Canada, with a special focus on single word borrowings, person reference and place names. The long-term perspective provides a unique insight into the dynamics of a bilingual writer’s emerging sociolinguistic competence as reflected by the flexible yet structured use of his resources within the social space of a bilingual community.
Neologisms, i.e., new words or meanings, are finding their way into everyday language use all the time. In the process, already existing elements of a language are recombined or linguistic material from other languages is borrowed. But are borrowed neologisms accepted similarly well by the speech community as neologisms that were formed from “native” material? We investigate this question based on neologisms in German. Building on the corresponding results of a corpus study, we test the hypothesis of whether “native” neologisms are more readily accepted than those borrowed from English. To do so, we use a psycholinguistic experimental paradigm that allows us to estimate the degree of uncertainty of the participants based on the mouse trajectories of their responses. Unexpectedly, our results suggest that the neologisms borrowed from English are accepted more frequently, more quickly, and more easily than the “native” ones. These effects, however, are restricted to people born after 1980, the so-called millenials. We propose potential explanations for this mismatch between corpus results and experimental data and argue, among other things, for a reinterpretation of previous corpus studies.
Following the successes of the ninth conference in 2022 held in the wonderful Santiago de Compostela, Spain, we are pleased to present the proceedings of the 10th edition of International Conference on CMC and Social Media Corpora for the Humanities (CMC-2023). The focal point of
the conference is to investigate the collection, annotation, processing, and analysis of corpora of computer-mediated communication (CMC) and social media.
Our goal is to serve as the meeting place for a wide variety of language-oriented investigations into CMC and social media from the fields of linguistics, philology, communication sciences, media
studies, and social sciences, as well as corpus and computational linguistics, language technology, textual technology, and machine learning.
This year’s event is the largest so far with 45 accepted submissions: 32 papers and 13 poster presentations, each of which were reviewed by members of our ever-growing scientific committee. The contributions were presented in five sessions of two or three streams, and a single poster session. The talks in these proceedings cover a wide range of topics, including the corpora construction, digital identities, digital knowledge-building, digitally-mediated interaction, features
of digitally-mediated communication, and multimodality in digital spaces.
As part of the conference, we were delighted to include two invited talks: an international keynote speech by Unn Røyneland from the University of Oslo, Norway, on the practices and perceptions of
researching dialect writing in social media, and a national keynote speech by Tatjana Scheffler from the Ruhr-University of Bochum on analysing individual linguistic variability in social media and
constructing corpora from this data. Additionally, participants could take part in a workshop on processing audio data for corpus linguistic analysis. This volume contains abstracts of the invited talks, short papers of oral presentations, and abstracts of posters presented at the conference.
Using multimodal conversation analysis, we investigate how novices learning the “inner body” acting technique in the context of a community theater project share their experiences of the bodily exercises through verbal and embodied conduct. We focus on how verbal description and bodily enactment of the experience mutually elaborate each other, and how the experienced sensorimotor and affective qualities are made to be witnessed and recognized by the others. Participants describe their experiences without naming qualities. Instead, a display of the experienced qualities is made accessible to others through coordinating the unfolding talk and bodily conduct. In particular, we show how grammatical and action projection is fulfilled by interconnected verbal and embodied conduct, with body movement and posture giving off ineffable experiential qualities. The moving body appears both as a source of the experience and as a resource for depicting perceived qualities to others; additional resources (non-specific person reference and gaze aversion) contribute to organizing the subjective and intersubjective layers of the reflection of the experiences. The study contributes to and extends recent research on sensoriality in interaction by focusing on phenomena of proprioception and interoception. The data are two cases drawn from 60 h of video-recordings made in the context of a devised community theater project. The data are in Finnish with English translations.
The CLARIN Concept Registry (CCR) is the common semantic ground for most CMDI-based profiles to describe language-related resources in the CLARIN universe. While the CCR supports semantic interoperability within this universe, it does not extend beyond it. The flexibility of CMDI, however, allows users to use other term or concept registries when defining their metadata components. In this paper, we describe our use of schema.org, a light ontology used by many parties across disciplines.
It was recently suggested in a study published in Nature Human Behaviour that the historical loosening of American culture was associated with a trade-off between higher creativity and lower order. To this end, Jackson et al. generate a linguistic index of cultural tightness based on the Google Books Ngram corpus and use this index to show that American norms loosened between 1800 and 2000. While we remain agnostic toward a potential loosening of American culture and a statistical association with creativity/order, we show here that the methods used by Jackson et al. are neither suitable for testing the validity of the index nor for establishing possible relationships with creativity/order.
In a previous study published in Nature Human Behaviour, Varnum and Grossmann claim that reductions in gender inequality are linked to reductions in pathogen prevalence in the United States between 1951 and 2013. Since the statistical methods used by Varnum and Grossmann are known to induce (seemingly) significant correlations between unrelated time series, so-called spurious or non-sense correlations, we test here whether the statistical association between gender inequality and pathogens prevalence in its current form also is the result of mis-specified models that do not correctly account for the temporal structure of the data. Our analysis clearly suggests that this is the case. We then discuss and apply several standard approaches of modelling time-series processes in the data and show that there is, at least as of now, no support for a statistical association between gender inequality and pathogen prevalence.
This paper reports on an ongoing international project of compiling a freely accessible online Dictionary of German Loans in Polish Dialects. The dictionary will be the first comprehensive lexicographic compendium of its kind, serving as a complement to existing resources on German lexical loans in the literary or standard language. The empirical results obtained in the project will shed new light on the distribution of German loanwords among different dialects, also in comparison to the well-documented situation in written Polish. The dictionary will have a strong focus on the dialectal distribution of Polish dialectal variants for a given German etymon, accessible through interactive cartographic representations and corresponding search options. The editorial process is realized with dedicated collaborative web tools. The new resource will be published as an integrated part of an online information system for German lexical borrowings in other languages, the Lehnwortportal Deutsch, and is therefore highly cross-linked with other loanword dictionaries on Polish as well as Slavic and further European languages.
This poster summarizes the results of the CLARIAH-DE Work Package 5 - Community Engagement: Outreach/Dissemination and Liaison.
Work package 5 engages with the community through dissemination activities, outreach and liaison. The work package set itself the following sub goals:
- Combining the existing dissemination and outreach activities of CLARIN-D and DARIAH-DE in a meaningful way and elaborating on them. In some cases this meant continuity, in other cases a new appearance for resources.
- Providing a web portal as a gateway to the CLARIAH-DE project.
- Creating a common identity and corporate identity and maintaining the established level of trust users already put into CLARIN-D and DARIAH-DE.
- Providing a social media presence as well as a physical presence at workshops, conferences and other meetings in the Digital Humanities.
CLARIAH-DE cross-service search - prospects and benefits of merging subject-specific services
(2021)
CLARIAH-DE combines services and offerings of CLARIN-D and DARIAH-DE. This includes various search applications which are made directly available to researchers. These search applications are presented in this working paper based on their main characteristics and compared with a focus on possible harmonizations. Opportunities and risks of different forms of technical integration are highlighted. Identified challenges can be explained in particular considering the background of different organizational and technical frameworks as well as highly specific and discipline-dependent requirements. The integration work that has already been carried out and the experiences gained with regard to future work and possible integration of further applications are also discussed. The experiences made in CLARIAH-DE can especially be of interest for other projects in the field of digital research infrastructures.
In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.
This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.
This poster summarizes the results of the CLARIAH-DE Work Package 3: Skills Training and Promotion of Junior Researchers.
For a research field that is characterised by rapid technical development, CLARIAH-DE has to include the promotion of data literacy necessary for the efficient use of this digital research infrastructure as part of its objective. To develop, consolidate and refine a common programme in this area, work package 3 set itself the following sub goals:
- Consolidation of the activities from the previous projects into a joint service
- Cataloguing and reflecting on the methods and tools used in the research field, with the aim of identifying remaining gaps
- Skills training of, individual support for and the promotion of junior researchers
We discuss the modal uses of the Hausa exclusive particle sai (≈ only). We argue that the distribution of sai in modal environments provides evidence for the following claims on the composition of modal meaning that have been independently made in the literature: i) Future-oriented modality involves a prospective aspect operator that can be realized covertly in some languages (e.g. English, Kratzer 2012b) and overtly in others (e.g. Gitksan, Matthewson 2012, 2013). ii) Necessity interpretations arise from exhaustifying possibilities, i.e. an exhaustivity operator applying to existential modality (e.g. Kaufmann 2012 for the case of imperatives and Leffel 2012 for a relevant analysis of necessity meaning in Masalit). We show that future-oriented necessity in Hausa decomposes into EXH((PROSP)), with sai contributing exhaustivity.
The 12th Web as Corpus workshop (WAC-XII) looks at the past, present, and future of web corpora given the fact that large web corpora are nowadays provided mostly by a few major initiatives and companies, and the diversity of the early years appears to have faded slightly. Also, we acknowledge the fact that alternative sources of data (such as data from Twitter and similar platforms) have emerged, some of them only available to large companies and their affiliates, such as linguistic data from social media and other forms of the deep web. At the same time, gathering interesting and relevant web data (web crawling) is becoming an ever more intricate task as the nature of the data offered on the web changes (for example the death of forums in favour of more closed platforms).
Comprehending conditional statements is fundamental for hypothetical reasoning about situations. However, the online comprehension of conditional statements containing different conditional connectives is still debated. We report two self-paced reading experiments on German conditionals presenting the conditional connectives wenn (‘if’) and nur wenn (‘only if’) in identical discourse contexts. In Experiment 1, participants read a conditional sentence followed by the confirmed antecedent p and the confirmed or negated consequent q. The final, critical sentence was presented word by word and contained a positive or negative quantifier (ein/kein ‘one/no’). Reading times of the two quantifiers did not differ between the two conditional connectives. In Experiment 2, presenting a negated antecedent, reading times for the critical positive quantifier (ein) did not differ between conditional connectives, while reading times for the negative quantifier (kein) were shorter for nur wenn than for wenn. The results show that comprehenders form distinct predictions about discourse continuations due to differences in the lexical semantics of the tested conditional connectives, shedding light on the role of conditional connectives in the online interpretation of conditionals in general.
In a recent paper published in the Journal of Language Evolution, Kauhanen, Einhaus & Walkden (KEW) challenge the results presented in one of my papers (Koplenig, Royal Society Open Science, 6, 181274 (2019)), in which I tried to show through a series of statistical analyses that large numbers of L2 (second language) speakers do not seem to affect the (grammatical or statistical) complexity of a language. To this end, I focus on the way in which the Ethnologue assesses language status: a language is characterised as vehicular if, in addition to being used by L1 (first language) speakers, it should also have a significant number of L2 users. KEW criticise both the use of vehicularity as a (binary) indicator of whether a language has a significant number of L2 users and the idea of imputing a zero proportion of L2 speakers to non-vehicular languages whenever a direct estimate of that proportion is unavailable. While I recognise the importance of post-publication commentary on published research, I show in this rejoinder that both points of criticism are explicitly mentioned and analysed in my paper. In addition, I also comment on other points raised by KEW and demonstrate that both alternative analyses offered by KEW do not stand up to closer scrutiny.
L’article intitulé «Traitement de l’information: Spinfo, HKI et humanités numériques - l’expérience de Cologne» présente l’histoire du développement des humanités numériques au sein de l’Université de Cologne. L'institutionnalisation des humanités numériques a commencé encore à l’époque où dans le monde germanophone le périmètre de la discipline était en train d’être défini par les travaux de quelques pionniers. Parmi eux, il convient de souligner le rôle d’Elisabeth Burr, active notamment à Tubingue, Duisbourg, Brême et Leipzig.L’article retrace le développement des humanités numériques à Cologne à partir de leurs débuts dans les années soixante du 20ème siècle, en passant par leur consolidation dans les années quatre-vingt-dix, jusqu’aux deux dernières décennies, quand Cologne est devenu un centre important de cette discipline. Le processus illustre comment une nouvelle discipline scientifique peut s’institutionnaliser au sein d’une université allemande. L’article décrit la perspective de deux domaines fondateurs: le traitement linguistique de l’information (en allemand: Sprachliche Informationsverarbeitung, Spinfo) et le traitement historico-culturel de l’information (en allemand: Historisch Kulturwissenschaftliche Informationsverarbeitung, HKI) et leur synthèse, qui a abouti en 2017 à la création de l’Institut des Humanités Numériques (Digital Humanities), qui aujourd’hui est - du point de vue interne - une composante de la Faculté de Philosophie de l’Université de Cologne et - du point de vue externe - une partie intégrante de la communauté internationale des humanités numériques.
We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.
The question of whether a letter is a grapheme or not is a perennial issue in writing research. The answer depends on which criteria are used to differentiate between letters and graphemes and, ultimately,how the unit ‘grapheme’ is defined. This problem is particularly relevant to complex graphemes, i.e. sequences of letters that behave like a single grapheme in certain respects. Typical for German is the ‹ch›. This paper argues for a scalar concept of graphemes, which compares the grapheme status of each of the units under investigation. For this purpose, new criteria for the identification of complex graphemes are used, which originate from handwriting analysis. There, it is shown that complex graphemes are connected with each other disproportionately often and also have deviating letter forms disproportionately often.