Refine
Year of publication
Document Type
- Conference Proceeding (387)
- Part of a Book (268)
- Article (204)
- Book (37)
- Working Paper (17)
- Doctoral Thesis (16)
- Other (14)
- Preprint (7)
- Part of Periodical (5)
- Report (2)
Language
- English (961) (remove)
Keywords
- Korpus <Linguistik> (279)
- Deutsch (209)
- Computerlinguistik (98)
- Annotation (70)
- Interaktion (65)
- Konversationsanalyse (60)
- Automatische Sprachanalyse (54)
- Gesprochene Sprache (53)
- Englisch (52)
- Wörterbuch (42)
Publicationstate
- Veröffentlichungsversion (961) (remove)
Reviewstate
Publisher
- IDS-Verlag (80)
- de Gruyter (55)
- Association for Computational Linguistics (44)
- European Language Resources Association (ELRA) (43)
- European Language Resources Association (24)
- Institut für Deutsche Sprache (24)
- Springer (19)
- Linköping University Electronic Press (15)
- The Association for Computational Linguistics (15)
- Leibniz-Institut für Deutsche Sprache (IDS) (14)
This study aims to establish what lexical factors make it more likely for dictionary users to consult specific articles in a dictionary using the English Wiktionary log files, which include records of user visits over the course of 6 years. Recent findings suggest that lexical frequency is a significant factor predicting look-up behavior, with the more frequent words being more likely to be consulted. Three further lexical factors are brought into focus: (1) age of acquisition; (2) lexical prevalence; and (3) degree of polysemy operationalized as the number of dictionary senses. Age of acquisition and lexical prevalence data were obtained from recent published studies and linked to the list of visited Wiktionary lemmas, whereas polysemy status was derived from Wiktionary entries themselves. Regression modeling confirms the significance of corpus frequency in explaining user interest in looking up words in the dictionary. However, the remaining three factors also make a contribution whose nature is discussed and interpreted. Knowing what makes dictionary users look up words is both theoretically interesting and practically useful to lexicographers, telling them which lexical items should be prioritized in lexicographic work.
In social interaction, different kinds of word-meaning can become problematic for participants. This study analyzes two meta-semantic practices, definitions and specifications, which are used in response to clarification requests in German implemented by the format Was heißt X (‘What does X mean?’). In the data studied, definitions are used to convey generalizable lexical meanings of mostly technical terms. These terms are either unknown to requesters, or, in pedagogical contexts, requesters ask in order to check the addressee’s knowledge. Specifications, in contrast, clarify aspects of local speaker meanings of ordinary expressions (e.g., reference, participants in an event, standards applied to scalar expressions). Both definitions and specifications are recipient-designed with respect to the (presumed) knowledge of the addressee and tailored to the topical and practical relevancies of the current interaction. Both practices attest to the flexibility and situatedness of speakers’ semantic understandings and to the systematicity of using meta-semantic practices differentially for different kinds of semantic problems. Data are come from mundane and institutional interaction in German from the public corpus FOLK.
Less than one percent of words would be affected by gender-inclusive language in German press texts
(2024)
Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.
In a previous study, Aceves and Evans present a large-scale quantitative information-theoretic analysis of parallel corpus data in ~1,000 languages to show that there are apparently strong associations between the way languages encode information into words and patterns of communication, e.g. the configuration of semantic information. During the peer review process, one reviewer raised the question of the extent to which the presented results depend on different corpus sizes (see the Peer Review File). This is a very important question given that most, if not all, of the quantities associated with word frequency distributions vary systematically with corpus size. While Aceves and Evans claim that corpus size does not affect the results presented, I challenge this view by presenting reanalyses of the data that clearly suggest that it does.
This contribution explores the relationship between the English CEFR (Common European Framework of Reference for Languages) vocabulary levels and user interest in English Wiktionary entries. User interest was operationalized through the number of views of these entries in Wikimedia server logs covering a period of four years (2019–2022). Our findings reveal a significant relationship between CEFR levels and user interest: entries classified at lower CEFR levels tend to attract more views, which suggests a greater user interest in more basic vocabulary. A multiple regression model controlling for other known or potential factors affecting interest: corpus frequency, polysemy, word prevalence, and age of acquisition confirmed that lower CEFR levels attract significantly more views even after taking into account the other predictors. These findings highlight the importance of CEFR levels in predicting which words users are likely to look up, with implications for lexicography and the development of language learning materials.
Rejecting the validity of inferred attributions of incompetence in German talk-in-interaction
(2024)
This paper deals with pragmatic inference from the perspective of Conversation Analysis. In particular, we examine a specific variety of inferences - the attribution of incompetence which Self constructs on the basis of Other's prior action, hearable as positioning Self as incompetent (e.g., instructions, offers of assistance, advice); this attribution of incompetence concerns Self's execution of some practical task. This inference is indexed in Self's response, which highlights Self's expertise, or competence concerning the task at hand. We focus on two recurrent types of such responses in our data: (i) accounting for competence through formulations of prior experience with carrying out a practical action and (ii) explicit claims of competence for accomplishing this action. We analyze the interactional environments in which these responses occur, the ways in which the two practices index Self's understanding of being positioned as incompetent and the interactional work they do. Finally, we discuss how through rejecting and inferred attribution of incompetence, Self implicitly seeks to restore their face and defend their autonomy as an agent, yet, without entering an explicit identity-negotiation. Findings rest on the analysis of 20 cases found in video-recordings of naturally occurring talk-in-interaction in German from the corpus FOLK.
This paper focuses on language change based on shifting social norms, in particular with regard to the debate on language and gender. It is a recurring argument in this debate that language develops "naturally" and that "severe interventions" - such as gender-inclusive language is often claimed to be - in the allegedly "organic" language system are inappropriate and even "dangerous". Such interventions are, however, not unprecedented. Socially motivated processes of language change are neither unusual nor new. We focus in our contribution on one important political-social space in Germany, the German Bundestag. Taking other struggles about language and gender in the plenaries of the Bundestag as a starting point, our article illustrates that language and gender has been a recurring issue in the German Bundestag since the 1980s. We demonstrate how this is reflected in linguistic practices of the Bundestag, by the use of a) designations for gays and lesbians; b) pair forms such as Bürgerinnen und Bürger (female and male citizens); and c) female forms of addresses and personal nouns ('Präsidentin' in addition to 'Präsident'). Lastly, we will discuss implications of these earlier language battles for the currently very heated debate about gender-inclusive language, especially regarding new forms with gender symbols like the asterisk or the colon (Lehrer*innen, Lehrer:innen; male*female teachers) which are intended to encompass all gender identities.
Drawing upon the transformative power of questions, the paper investigates questioning sequences from authentic coaching data to examine the systematic use of a particular succession of formulation and question and its impact on inviting self-reflection processes in the client and eliciting change. The object of investigation in this paper are therefore questioning sequences in which a coach asks a question immediately after a rephrasing or relocating action, prompting the client to respond in an explicit or implicit way. The coach hereby shifts the focus to a hypothetical scenario, prompting the client to change her perspective on the matter and reflect on her own statements, ideas and attitudes from an outside perspective. The paper aims to contribute to closing the research gap of the change potential of reflection-stimulating action techniques used by coaches, by investigating one of many ways of how questions can be powerful tools to invite a change of perspective for the client. The study focuses on one coaching process consisting of three sessions between a female coach and a female client, utilizing a single case study approach. The data collection was part of the interdisciplinary project “Questioning Sequences in Coaching”, comprising 14 authentic coaching processes. The analysis follows Peräkylä’s Transformative Sequences model, examining the first position including the formulation and the subsequent question, the client’s response, and the coach’s reaction to the response. On a practical level, the main purpose of this paper is not to contribute to the many ways practical literature recommends coaches how to do their work and how to ask questions, but rather to show in what ways the elicitation of self-reflection processes in clients has been achieved by other coaches in authentic coaching sessions.
We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.
Gestures can be brief and compact in their execution, but also elaborate and extended. One way to utilise this kinetic flexibility is to extend one’s gesture in time by holding it in its stroke position. This study explores the interactional function of gestural holds by investigating pointing gestures that are sustained beyond a sequence-initiating turn and into the responsive space following it. The study draws on video data from naturally occurring conversations in German and focuses on held pointing gestures after instructions and questions. It is shown that in both action environments, participants delay gestural closure to indicate that they still consider the addressee’s response to be insufficient.
Rules of behavior are fundamental to human sociality. Whether on the road, at the dinner table, or during a game, people monitor one another’s behavior for conformity to rules and may take action to rectify violations. In this study, we examine two ways in which rules are enforced during games: instructions and reminders. Building on prior research, we identify instructions as actions produced to rectify violations based on another’s lack of knowledge of the relevant rule; knowledge that the instruction is designed to impart. In contrast to this, the actions we refer to as reminders are designed to enforce rules presupposing the transgressor’s competence and treating the violation as the result of forgetfulness or oversight. We show that instructing and reminding actions differ in turn design, sequential development, the epistemic stances taken by transgressors and enforcers, and in how the action affects the progressivity of the interaction. Data are in German and Italian from the Parallel European Corpus of Informal Interaction (PECII).
In workplace settings, skilled participants cooperate on the basis of shared routines in smooth and often implicit ways. Our study shows how interactional histories provide the basis for routine coordination. We draw on theater rehearsals as a perspicuous setting for tracking interactional histories. In theater rehearsals, the process of building performing routines is in focus. Our study builds on collections of consecutive performances of the same instructional task coming from a corpus of video-recordings of 30 h of theater rehearsals of professional actors in German. Over time, instructions and their implementations are routinely coordinated by virtue of accumulated shared interactional experience: Instructions become shorter, the timing of responses becomes increasingly compacted and long negotiations are reduced to a two-part sequence of instruction and implementation. Overall, a routine of how to perform the scene emerges. Over interactional histories, patterns of projection of next actions emanating from instructions become reliable and can be used by respondents as sources for anticipating and performing relevant next actions. The study contributes to our understanding of how shared knowledge and routines accumulate over shared interactional experiences in publicly performed and reciprocally perceived ways and how this impinges on the efficiency of joint action.
L’article intitulé «Traitement de l’information: Spinfo, HKI et humanités numériques - l’expérience de Cologne» présente l’histoire du développement des humanités numériques au sein de l’Université de Cologne. L'institutionnalisation des humanités numériques a commencé encore à l’époque où dans le monde germanophone le périmètre de la discipline était en train d’être défini par les travaux de quelques pionniers. Parmi eux, il convient de souligner le rôle d’Elisabeth Burr, active notamment à Tubingue, Duisbourg, Brême et Leipzig.L’article retrace le développement des humanités numériques à Cologne à partir de leurs débuts dans les années soixante du 20ème siècle, en passant par leur consolidation dans les années quatre-vingt-dix, jusqu’aux deux dernières décennies, quand Cologne est devenu un centre important de cette discipline. Le processus illustre comment une nouvelle discipline scientifique peut s’institutionnaliser au sein d’une université allemande. L’article décrit la perspective de deux domaines fondateurs: le traitement linguistique de l’information (en allemand: Sprachliche Informationsverarbeitung, Spinfo) et le traitement historico-culturel de l’information (en allemand: Historisch Kulturwissenschaftliche Informationsverarbeitung, HKI) et leur synthèse, qui a abouti en 2017 à la création de l’Institut des Humanités Numériques (Digital Humanities), qui aujourd’hui est - du point de vue interne - une composante de la Faculté de Philosophie de l’Université de Cologne et - du point de vue externe - une partie intégrante de la communauté internationale des humanités numériques.
In theater as a bodily-spatial art form, much emphasis is placed on the way actors perform movements in space as an important multimodal resource for creating meaning. In theater rehearsals, movements are created in series of directors' instructions and actors' implementations. Directors' instructions on how to conduct a movement often draw on embodied demonstrations in contrast to verbal descriptions. For instance, to instruct an actress to act like a school girl a director can use depictive (he demonstrates the expected behavior) instead of descriptive (“can you act like a school girl”) means. Drawing on a corpus of 400 h video recordings of rehearsal interactions in three German professional theater productions, from which we selected 265 cases, we examine ways to instruct movement-based actions in theater rehearsals. Using a multimodally extended ethnomethodological-conversation analytical approach, we focus on the multimodal details that constitute demonstrations as complex action types. For the present article, we have chosen nine instances, through which we aim to illuminate (1) The difference in using embodied demonstrations versus verbal descriptions to instruct; (2) typical ways directors combine verbal descriptions with embodied demonstrations in their instructions. First, we ask what constitutes a demonstration and what it achieves in comparison to verbal descriptions. Using a typical case, we illustrate four characteristics of demonstrations that all of the cases we studied share. Demonstrations (1) are embedded in instructional activities; (2) show and do not tell; (3) are responded to by emulating what was shown; (4) are rhetorically shaped to convey the instruction's focus. However, none of the 265 demonstrations we investigated were produced without verbal descriptions. In a second step we therefore ask in which typical ways verbal descriptions accompany embodied demonstrations when directors instruct actors how to play a scene. We distinguish four basic types. Verbal descriptions can be used (1) to build the demonstration itself; (2) to delineate a demonstration verbally within an instruction; (3) to indicate positive (what should be done) and negative (what should be avoided) versions of demonstrations; (4) as an independent means to describe the instruction's focus in addition to the demonstration. Our study contributes to research on how embodied resources are used to create meaning and how they combine with and depend on verbal resources.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
In English, past tense stative clauses embedded under a past-marked attitude verb, like Eric thought that Kalina was sick, can receive two interpretations, differing on when the state of the complement is understood to hold, i.e. Kalina’s sickness precedes the time of Eric’s thinking (backward-shifted reading), or Kalina is sick at the time of Eric’s thinking (simultaneous reading). As is well known, the availability of the simultaneous reading—also called Sequence of tense (SOT)—is subject to cross-linguistic variation. Non-SOT languages only allow for the backward-shifted interpretation. This cross-linguistic variation has been analysed in two main ways in the literature: a structural approach, connecting the availability of the simultaneous reading in a language to a syntactic mechanism that allows the embedded past not to be interpreted; and an implicature approach, which links the absence of such a reading to the presence of a “cessation” implicature associated with past tense. We report a series of experiments on Polish, which is commonly classified as a non-SOT language. First, we investigate the interpretation of complement clauses embedded under past-marked attitude verbs in Polish and English. This investigation revealed a difference between these two languages in the availability of simultaneous interpretations for past-under-past complement clauses, albeit not as large as a binary distinction between SOT and non-SOT languages would lead us to expect. We then address the question of whether the lower acceptability we observe for simultaneous readings in Polish might be due to an embedded cessation implicature. On the way to address this question, we show that in simple matrix clauses, Polish gives rise to the same cessation inference as English. Then we investigate Polish past-under-past sentences in positive and negative contexts, comparing their potential cessation implicature to the exclusive implicature of disjunction. In our results, we found that the latter was endorsed more often in positive than in negative contexts, as expected, while the cessation implicature was endorsed overall very little, with no difference across contexts. The disanalogy between the disjunction and the temporal cases, and the insensitivity of the latter to monotonicity, are a challenge for the implicature approach, and cast doubts on associating SOT phenomena with implicatures.
Developments within the field of Second Language Acquisition (SLA) have meant that scholars are increasingly engaging with corpora and corpus-based resources, providing a source of “‘authentic’ language” to learners and educators (Mitchell 2020: 254), and contributing to “state-of-the-art research methodologies” (Deshors and Gries 2023: 164). However, there are areas in which progress can still be made, particularly in the area of metadata, such as information about the speaker and contexts of the language use, as well as increased variety in the text types and genres of corpora used to develop SLA materials (Paquot 2022: 36). This post discusses one such possibility for increasing the variety of text types and providing a rich source of authentic language that can be used to create engaging SLA materials, particularly for young people learning German, namely the use of the NottDeuYTSch corpus (to download the corpus in a variety of formats, see Cotgrove 2018).
We present a collection of (currently) about 5.500 commands directed to voice-controlled virtual assistants (VAs) by sixteen initial users of a VA system in their homes. The collection comprises recordings captured by the VA itself and with a conditional voice recorder (CVR) selectively capturing recordings including the VA-directed commands plus some surrounding context. Next to a description of the collection, we present initial findings on the patterns of use of the VA systems during the first weeks after installation, including usage timing, the development of usage frequency, distributions of sentence structures across commands, and (the development of) command success rates. We discuss the advantages and disadvantages of the applied collection-specific recording approach and describe potential research questions that can be investigated in the future, based on the collection, as well as the merit of combining quantitative corpus linguistic approaches with qualitative in-depth analyses of single cases.
This article investigates mundane photo taking practices with personal mobile devices in the co-presence of others, as well as “divergent” self-initiated smartphone use, thereby exploring the impact of everyday technologies on social interaction. Utilizing multimodal conversation analysis, we examined sequences in which young adults take pictures of food and drinks in restaurants and cafés. Although everyday interactions are abundant in opportunities for accomplishing food photography as a side activity, our data show that taking pictures is also often prioritized over other activities. Through a detailed sequential analysis of video recordings and dynamic screen captures of mobile devices, we illustrate how photographers orient to the momentary opportunities for and relevance of photo taking, that is, how they systematically organize their photographing with respect to the ongoing social encounter and the (projected) changes in the material environment. We investigate how the participants multimodally negotiate the “mainness” and “sideness” (Mondada, 2014) of situated food photography and describe some particular features of participants’ conduct in moments of mundane multiactivity.
“Die Sprach-Checker” (Eng. “Language Checkers”) are young citizen scientists from Mannheim’s highly diverse district Neckarstadt-West. Together with linguists, they investigate a tremendous treasure: their own multilingualism. They are exploring and (re)discovering their own languages and the other languages used in their environment while documenting and reflecting on their everyday experiences in and with different linguistic practices. Our aim is to raise awareness of their strengths and to promote appreciation for their language biographies, thus fostering a sense of identification with one’s own linguistic surroundings. Such a joint research endeavour offers empirical opportunities to address (linguistic) issues of societal relevance by collecting authentic data from the multicultural district and involving its residents and local stakeholders. In this paper, we will provide insights regarding the project’s background, conception, and outcomes. We address everyone who is planning or conducting a citizen science project with young people, especially children and adolescents, or who works at the interface between science and society.
This paper analyses intensification in German digitally-mediated communication (DMC) using a corpus of YouTube comments written by young people (the NottDeuYTSch corpus). Research on intensification in written language has traditionally focused on two grammatical aspects: syntactic intensification, i.e. the use of particles and other lexical items and morphological intensification, i.e. the use of compounding. Using a wide variety og examples from the corpus, the paper identifies novel ways that have been used for intensification in DMC, and suggests a new taxonomy of classification for future analysis of intensification.
This White Paper sets out commonly agreed definitions on activities of consortia within NFDI. It aims to provide a common basis for reporting and reference regarding selected questions of cross-consortial relevance in DFG’s template for the Interim Reports. The questions were prioritised by an NFDI Task Force on Evaluation and Reporting (formerly Task Force Monitoring) as a result of discussing possible answers to the DFG template. In this process the need to agree on a generalizable meaning of terms commonly used in the context of NFDI, and reporting in particular, were identified from cross-consortial perspectives. Questions that showed the highest requirement on clarification are discussed in this White Paper. As NFDI evolves, the Task Force will likely propose further joint approaches for reporting in information infrastructures.
While each of broad relevance, the questions addressed relate to substantially different aspects of consortia’s work. They are thus also structured slightly different.
The Encyclopedia of Terminology for Conversation Analysis and Interactional Linguistics is an online resource for students and scholars of CA/IL, publicly available on the EMCA Wiki page. Encyclopedias and glossaries are widespread across various fields and methods, and serve as immensely valuable resources. Given the extent to which the EMCA/IL community has expanded over the years—both terminologically as well as geographically—we hope that this encyclopedia of terminology will be well received by students and practitioners of CA and IL across the globe.
Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.
The Data Governance Act was proposed in late 2020 as part of the European Strategy for Data, and adopted on 30 May 2022 (as Regulation 2022/868). It will enter into application on 24 September 2023. The Data governance Act is a major development in the legal framework affecting CLARIN and the whole language community. With its new rules on the re-use of data held by the public sector bodies and on the provision of data sharing services, and especially its encouragement of data altruism, the Data Governance Act creates new opportunities and new challenges for CLARIN ERIC. This paper analyses the provisions of the Data Governance Act, and aims at initiating the debate on how they will impact CLARIN and the whole language community.
Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
Pivot
(2023)
The term pivot denotes an element of talk that can be understood to belong to two larger units of talk simultaneously, thereby joining them together and acting as a transitional link between them (Schegloff 1979: 275-276). Most commonly, the term is used to refer to lexico-syntactic elements that can be interpreted as ending one turn-constructional unit (TCU) while at the same time launching a next.
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
Retro-sequence
(2023)
Modular pivot
(2023)
A modular pivot is a type of turn-constructional pivot. It is built from syntactically entirely optional items (i.e. linguistic adjuncts) that can occur in both turn-initial and turn-final position and can therefore be used to patch a wide range of otherwise discrete turn-constructional units (TCUs) together (Clayman & Raymond 2015). A prime example of an item that lends itself to be deployed as a modular pivot are address terms (Clayman 2012).
Assessment
(2023)
Most broadly, an assessment is a type of social action by which an interactant expresses an evaluative stance towards someone or something (e.g., an object, an event, an action, an experience, a state of affairs, a place, a circumstance, etc.). The target of an assessment is typically called the ‘assessable’.
Allusion
(2023)
Recent years have seen a growing interest in grammatical variation, a core explanandum of grammatical theory. The present volume explores questions that are fundamental to this line of research: First, the question of whether variation can always and completely be explained by intra- or extra-linguistic predictors, or whether there is a certain amount of unpredictable – or ‘free’ – grammatical variation. Second, the question of what implications the (in-)existence of free variation would hold for our theoretical models and the empirical study of grammar. The volume provides the first dedicated book-length treatment of this long-standing topic. Following an introductory chapter by the editors, it contains ten case studies on potentially free variation in morphology and syntax drawn from Germanic, Romance, Uralic and Mayan.
This paper examines multi-unit turns that allow speakers to retrospectively close the prior sequence while prospectively launching a new sequence, which Schegloff (1986) referred to as interlocking organization. Using English telephone conversations as data, we focus on how multi-unit turns are used for topic shifts, and show that interlocking organization operates in conjunction with other phonetic and lexical features, such as increased pitch and overt markers of disjunction (e.g., “listen”). In addition, speakers utilize an audible inbreath that is placed between the first and the second units as a central interactional resource to project further talk, thereby suppressing speaker transition and possibly highlighting the action delivered in the second unit as being distinctly new. We propose that interlocking multi-unit turns, when used to make topically disjunctive moves, promote progressivity by avoiding a possible lapse in turn transition
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
"Reproducibility crisis" and "empirical turn" are only two keywords when it comes to providing reasons for research data management. Research data is omnipresent and with the more and more automatic data processing procedures, they become even more important. However, just because new methods require data and produce data, this does not mean that data are easily accessible, reusable or even make a difference in the CV of a researcher, even if a large portion of research goes into data creation, acquisition, preparation, and analysis. In this talk I will present where we find data in the research process, where we may find appropriate support for data management and advocate for a procedure for including it in research publications and resumes.
This presentation relies on work within the BMBF-funded project CLARIN-D. It also builds on work within the German National Research Data Infrastructure (NFDI) consortium Text+, DFG project number 460033370.
This contribution summarizes the lessons learned from the organization of a joint conference on text analytics research by the Business, Economic, and Related Data (BERD@NFDI) and Text+ consortia within the National Research Data Infrastructure (NFDI) in Germany. The collaboration aimed to identify common ground and foster interdisciplinary dialogue between scholars in the humanities and in the business domain. The lessons learned include the importance of presenting research questions using textual data to establish common ground, similarities in methodology for processing textual data between the consortia, similarities in research data management, and the need for regular interconsortial discussions on textual analysis methods and data. The collaboration proved valuable for interdisciplinary dialogue within the NFDI, and further collaboration between the consortia is planned.
Prediction is a central mechanism in the human language processing architecture. The psycholinguistic and neurolinguistic literature has seen a lively debate about what form prediction may take and what status it has for language processing in the human mind and brain. While predictions are a ubiquitous finding, the implications of these results for models of language processing differ. For instance, eyetracking data suggest that predictions may rely on sublexical orthographic information in natural reading, while electrophysiological data provide mixed evidence for form-based predictions during reading. Other research has revealed that humans rapidly adapt to text specifics and that their predictive capacity varies, broadly speaking, in accordance with inter- and intra-individual language proficiency, which cuts across the speaker groups (e.g. L1 vs. L2 speakers, skilled vs. untrained readers) traditionally used for experimental contrasts. There is therefore evidence that the kind and strength of linguistic predictions depend on (at least) three sources of variability in language processing: speaker, text genre and experimental method.
The aim of this Research Topic is to develop a better understanding of prediction in light of the three sources of variability in language processing, by providing an overview of state-of-the art research on predictive language processing and by bringing together research from various disciplines.
First, intra-and inter-individual differences and their influence on predictive processes remain underrepresented in experimental research on predictive processing. How do language users differ in their predictive abilities and strategies, and how are these differences shaped by e.g. biological, social and cultural factors?
Second, while language users experience great stylistic diversity in their daily language exposure and use, the majority of language processing research still focuses on a very constrained register of well-controlled sentences composed in the standard language. How are predictions shaped by extra- and meta-linguistic context, such as register/genre or accent/speaker identity, and how may this influence the processing of experimental items in another language or text variety?
Third, the Research Topic invites contributions that make use of a multi-method approach, such as combined behavioral and electrophysiological measures or experimental methods combined with measures extracted from corpus data. What opportunities and challenges do we face when integrating multiple approaches to examine linguistic, experimental and individual differences in human predictive capacity?
We welcome contributions from all areas of empirical psycho- and neurolinguistics, but contributions must explicitly address variability and variation in language and language processing. Relevant topics include individual differences and the impact of genre, modality, register and language variety. Contributions that go beyond single word and single sentence paradigms are especially desirable. Experimental, corpus-based, meta-analytic and review papers, as well as theoretical/opinion pieces are welcome; however, papers of the latter type should support their arguments with substantial empirical evidence from the literature. Particularly desirable are contributions which combine topics and/or methods, such as the impact of an individual's native dialect on processing of constructions that show variability in the standard language (e.g. choice of auxiliary, agreement of mass nouns, etc.) or experimental methods combined with measures extracted from corpus data such as information-theoretic surprisal.
One of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of what we call average prediction complexity. We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. In addition, we show that speaker population size predicts entropy. We argue that both results constitute evidence against the equi-complexity hypothesis from an information-theoretic perspective.
Neologisms, i.e., new words or meanings, are finding their way into everyday language use all the time. In the process, already existing elements of a language are recombined or linguistic material from other languages is borrowed. But are borrowed neologisms accepted similarly well by the speech community as neologisms that were formed from “native” material? We investigate this question based on neologisms in German. Building on the corresponding results of a corpus study, we test the hypothesis of whether “native” neologisms are more readily accepted than those borrowed from English. To do so, we use a psycholinguistic experimental paradigm that allows us to estimate the degree of uncertainty of the participants based on the mouse trajectories of their responses. Unexpectedly, our results suggest that the neologisms borrowed from English are accepted more frequently, more quickly, and more easily than the “native” ones. These effects, however, are restricted to people born after 1980, the so-called millenials. We propose potential explanations for this mismatch between corpus results and experimental data and argue, among other things, for a reinterpretation of previous corpus studies.
In a recent paper published in the Journal of Language Evolution, Kauhanen, Einhaus & Walkden (KEW) challenge the results presented in one of my papers (Koplenig, Royal Society Open Science, 6, 181274 (2019)), in which I tried to show through a series of statistical analyses that large numbers of L2 (second language) speakers do not seem to affect the (grammatical or statistical) complexity of a language. To this end, I focus on the way in which the Ethnologue assesses language status: a language is characterised as vehicular if, in addition to being used by L1 (first language) speakers, it should also have a significant number of L2 users. KEW criticise both the use of vehicularity as a (binary) indicator of whether a language has a significant number of L2 users and the idea of imputing a zero proportion of L2 speakers to non-vehicular languages whenever a direct estimate of that proportion is unavailable. While I recognise the importance of post-publication commentary on published research, I show in this rejoinder that both points of criticism are explicitly mentioned and analysed in my paper. In addition, I also comment on other points raised by KEW and demonstrate that both alternative analyses offered by KEW do not stand up to closer scrutiny.
Open Science and language data: Expectations vs. reality. The role of research data infrastructures
(2023)
Language data are essential for any scientific endeavor. However, unlike numerical data, language data are often protected by copyright, as they easily meet the threshold of originality. The role of research infrastructures (such CLARIN, DARIAH, and Text+) is to bridge the gap between uses allowed by statutory exceptions and the requirements of Open Science. This is achieved on the one hand by sharing language data produced by research organisations with the widest possible circle of persons, and on the other by mutualizing efforts towards copyright clearance and appropriate licensing of datasets.
The proposed contribution will shed light on current and future challenges on legal and ethical questions in research data infrastructures. The authors of the proposal will present the work of NFDI’s section on Ethical, Legal and Social Aspects (hereinafter: ELSA), whose aim is to facilitate cross-disciplinary cooperation between the NFDI consortia in the relevant areas of management and re-use of research data.
This paper presents an extended annotation and analysis of interpretative reply relations focusing on a comparison of reply relation types and targets between conflictual pages and neutral pages of German Wikipedia (WP) talk pages. We briefly present the different categories identified for interpretative reply relations to analyze the relationship between WP postings as well as linguistic cues for each category. We investigate referencing strategies of WP authors in discussion page postings, illustrated by means of reply relation types and targets taking into account the degree of disagreement displayed on a WP talk page. We provide richly annotated data that can be used for further analyses such as the identification of interactional relations on higher levels, or for training tasks in machine learning algorithms.
This manual introduces a conversation analytically informed coding scheme for episodes involving the direct social sanctioning of problem behavior in informal social interaction which was developed in the project Norms, Rules, and Morality across Languages (NoRM-aL) at the Leibniz-Institute for the German Language. It outlines the background for its development, delimits the phenomena to which the coding scheme can be applied and provides instructions for its use.
The scheme asks for basic information about the recording and the participants involved in the episode, before taking stock of different features of the sanctioning episode as a whole. This is followed by sets of specific coding questions about the sanctioning move itself (such as its timing and composition) and the reaction it engenders. The coding enables researchers to get a bird’s eye view on recurrent features of such episodes in larger quantities of data and allows for comparisons across different languages and informal settings.
Using multimodal conversation analysis, we investigate how novices learning the “inner body” acting technique in the context of a community theater project share their experiences of the bodily exercises through verbal and embodied conduct. We focus on how verbal description and bodily enactment of the experience mutually elaborate each other, and how the experienced sensorimotor and affective qualities are made to be witnessed and recognized by the others. Participants describe their experiences without naming qualities. Instead, a display of the experienced qualities is made accessible to others through coordinating the unfolding talk and bodily conduct. In particular, we show how grammatical and action projection is fulfilled by interconnected verbal and embodied conduct, with body movement and posture giving off ineffable experiential qualities. The moving body appears both as a source of the experience and as a resource for depicting perceived qualities to others; additional resources (non-specific person reference and gaze aversion) contribute to organizing the subjective and intersubjective layers of the reflection of the experiences. The study contributes to and extends recent research on sensoriality in interaction by focusing on phenomena of proprioception and interoception. The data are two cases drawn from 60 h of video-recordings made in the context of a devised community theater project. The data are in Finnish with English translations.
Picnick and Sauerkraut: German–English intra-writer variation in script and language (1867–1900)
(2023)
Intra-writer variation is a wide-spread phenomenon that nevertheless has received only limited research attention so far. Different addressees, bi- and multilingualism, or changing life phases are among the factors that contribute to such variation. In a study of diary entries by one writer covering three decades (1867–1900), this chapter investigates patterns of intra-writer variation between German and English (language and script) in nineteenth-century Canada, with a special focus on single word borrowings, person reference and place names. The long-term perspective provides a unique insight into the dynamics of a bilingual writer’s emerging sociolinguistic competence as reflected by the flexible yet structured use of his resources within the social space of a bilingual community.
In workplace settings, skilled participants cooperate on the basis of shared routines in smooth and often implicit ways. Our study shows how interactional histories provide the basis for routine coordination. We draw on theater rehearsals as a perspicuous setting for tracking interactional histories. In theater rehearsals, the process of building performing routines is in focus. Our study builds on collections of consecutive performances of the same instructional task coming from a corpus of video-recordings of 30 h of theater rehearsals of professional actors in German. Over time, instructions and their implementations are routinely coordinated by virtue of accumulated shared interactional experience: Instructions become shorter, the timing of responses becomes increasingly compacted and long negotiations are reduced to a two-part sequence of instruction and implementation. Overall, a routine of how to perform the scene emerges. Over interactional histories, patterns of projection of next actions emanating from instructions become reliable and can be used by respondents as sources for anticipating and performing relevant next actions. The study contributes to our understanding of how shared knowledge and routines accumulate over shared interactional experiences in publicly performed and reciprocally perceived ways and how this impinges on the efficiency of joint action.
Theater rehearsals are (usually) confronted with the problem of having to transform a written text into an audio-visual, situated and temporal performance. Our contribution focuses on the emergence and stabilization of a gestural form as a solution for embodying a certain aesthetic concept which is derived from the script. This process involves instructions and negotiations, making the process of stabilization publicly and thus intersubjectively accessible. As scenes are repeatedly rehearsed, rehearsals are perspicuous settings for tracking interactional histories. Based on videotaped professional theatre interactions in Germany, we focus on consecutive instances of rehearsing the same scene and trace the interactional history of a particular gesture. This gesture is used by the director to instruct the actors to play a particular aspect of a scene adopting a certain aesthetic concept. Stabilization requires the emergence of shared knowledge. We will show the practices by which shared knowledge is established over time during the rehearsal process and, in turn, how the accumulation of knowledge contributes to a change in the interactional practices themselves. Specifically, we show how a gesture emerges in the process of developing and embodying an aesthetic concept, and how this gesture eventually becomes a sign that refers to and evokes accumulated knowledge. At the same time, we show how this accumulated knowledge changes the instructional activities in the rehearsal process. Our study contributes to the overall understanding of knowledge accumulation in interaction in general and in theater rehearsals in particular. At the same time, it is devoted to the central importance of gestures in theater, which are both a means and a product of theatrical staging.
National Socialism, one could argue, was all about belonging: belonging to the ‘Volk’ or the ‘Volksgemeinschaft’, belonging to the ‘Aryan’ or ‘Non-Aryan race’, belonging to the National Socialist ‘movement’, and so on. These categories of belonging worked both inclusionary and exclusionary and they were constituted, proclaimed and enacted to a great part through language. What is more, they had to be performed through communicative acts. For the normative side of National Socialist propaganda and legislation, this seems rather obvious and one-directional. On the side of the general population, however, this entailed a mixture of communicative need to position oneself vis-à-vis National Socialism (mostly in affirmative ways), but also the urge to do so willingly. When we look at the language use of ‘ordinary people’ in different communicative situations and texts during National Socialism, we have to focus on these dimensions of discursive collusion, co-constitution and appropriation. People during National Socialism, such is our hypothesis, navigated through discourses of belonging and by that made them real and effective. Besides diaries, war letters and autobiographical writings, one way to grasp this phenomenon is to analyse petitions, i.e., letters of complaint and request sent in large numbers by ‘ordinary people’ to public authorities of the party and the state. As I will show by some examples, letter-writers tried to inscribe themselves within (what they took for) National Socialist discourses of belonging in order to legitimate their claims. By doing so, they co-constituted and co-created the discursive realm of National Socialism.
This conference booklet provides information about 10th International Contrastive Linguistics Conference (ICLC-10) that took place in Mannheim, Germany, from 18 to 21 July 2023. It contains
– a description of the conference aims,
– details on the conference venue,
– information on committees,
– the conference program,
– the abstracts of the keynotes, oral and poster presentations, and
– an author index.
Despite being an official language of several countries in Central and Western Europe, German is not formally recognised as the official language of the Federal Republic of Germany. However, in certain situations the use of the German language, including the spelling rules, is subject to state regulation (by acts of Federal Parliament orby administrative decisions). This article presents the content of this regulation, its scope, and the historical context in which it was adopted.
Our current era of globalization is characterized above all by increased mobility, namely by the increasing mobility of people and the development of new communication technologies, including the mobility of linguistic signs and resources. This process raises new theoretical and methodological questions in linguistics, which results in the development of a new sociolinguistics of globalization (Blommaert 2010) in recent years. One of the most obvious ways to trace this new and dynamic development is to analyze individual language repertoires, especially those of migrants. In this essay, I examine aspects of the communicative repertoire of a refugee who fled to Germany in 2015 to escape the civil war in Syria. I draw on two interviews I conducted with him (in the following I refer to him by the pseudonym „Baran“). The first interview with Baran was recorded in 2016, a few months after his arrival in Germany. The second interview is from 2023, seven years later. In both recordings, German was the dominant language of interaction. I will analyze and show the characteristics of his German at the beginning of his immigration, how he resorts to practices of language mixing between German, Turkish and English (which has recently also been referred to as translanguaging) and how his German has developed over the course of the past seven years.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
This paper introduces the Nottinghamer Korpus deutscher YouTube-Sprache (‘The Nottingham German YouTube Language Corpus’ - or NottDeuYTSch corpus). The corpus comprises over 33 million words, taken from roughly 3 million YouTube comments published between 2008 and 2018, written by a young, German-speaking demographic. The NottDeuYTSch corpus provides an authentic and representative linguistic snapshot of young German speakers and offers significant opportunities for in-depth research in several linguistic fields, such as lexis, morphology, syntax, orthography, multilingualism, and conversational and discursive analysis.
The NottDeuYTSch corpus is a freely available collection of YouTube comments written under German-speaking videos by young people between 2008 and 2018. The article uses the NottDeuYTSch corpus to investigate how YouTube comments can be used to produce learning materials and how corpora of Digitally-Mediated Communication can benefit intermediate learners of German. The article details the effects of authentic communication within YouTube comments on teenage learners, examining how they can influence the psycholinguistic factors of motivation, foreign language anxiety, and willingness to communicate. The article also discusses the benefits and limitations of using authentic corpus material for the development of teaching material.
This study investigates other-initiated repair and its embodied dimension in casual English as lingua franca (ELF) conversations, thereby contributing to the further understanding of multimodal repair practices in social interaction. Using multimodal conversation analysis, we focus on two types of restricted other-initiation of repair (OIR): partial repeats preceded or followed by the question word what (i.e., what X?/X what?) and copular interrogative clauses (i.e., what is X). Partial repeats with what produced with rising final intonation are consistently accompanied by a head poke and treated as relating to troubles in hearing, with the repair usually consisting of a repeat. In contrast to these partial repeats, copular interrogative clauses are produced with downward final intonation and accompanied by face-related embodied conduct. The what is X OIRs primarily target code-switched lexical items, the understanding of which is critical for maintaining the repair initiator’s involvement in the ongoing sequence. This study also contributes some general reflections on the possible complexity of OIR and repair practices from a multimodal perspective.
Pseudo-coordinated sitzen and stehen in spoken German: a case of emergent progressive aspect?
(2023)
This paper investigates the aspectual potential of posture verb pseudocoordination in spoken German. In a corpus study of sitzen ‘sit’ and stehen ‘stand’, it is shown that despite a preference for activity verbs, verbs of all aspectual classes occur in the second conjunct. The posture verb imposes its durative meaning component on the second verb, thus making a progressive interpretation of the construction possible. Apart from this emergent aspectual function, German posture verb pseudocoordination has a subjective function (conveying the speaker’s beliefs about the subject referent’s stance), and a discourse pragmatic function (information packaging).
From June 26th to July 2nd 2023 the International Conference on Conversation Analysis (ICCA) took place in Brisbane/Meanjin, Australia – after a long pause due to the Covid-pandemic and for the first time in the southern hemisphere. About 350 participants from about 50 different countries attended the conference. This year’s ICCA came up with 36 panels and about 300 papers that were presented. Four plenary speakers have been invited and 24 pre-conference workshops took place. On Wednesday evening Ilana Mushin, in her role as conference chair, officially opened ICCA. The President of the International Society of Conversation Analysis (ISCA), Tanya Stivers, also welcomed all participants. To get acquainted with the indigenous culture of Queensland, the opening ceremony was enriched with a highly impressive dance performance by First Nations people. After the official inauguration the international community met at the Welcome Reception to look forward together to the days ahead with many opportunities for exchange and networking.
As it will become clear throughout this report, the research topics revolved around not only classic CA concepts, but also importantly concerned embodiment, which continued the line of past conferences (Dix 2019). Another aspect that has been highlighted was conflict and social norms. Due to personal capacities, we can only present a selection of presentations within the scope of this conference report. The selection was influenced by the personal interest of the authors and should not be understood as rating in any sense.
‘Can’ and ‘must’-type modal verbs in the direct sanctioning of misconduct across European languages
(2023)
Deontic meanings of obligation and permissibility have mostly been studied in relation to modal verbs, even though researchers are aware that such meanings can be conveyed in other ways (consider, for example, the contributions to Nuyts/van der Auwera (eds.) 2016). This presentation reports on an ongoing project that examines deontic meaning but takes as its starting point not a type of linguistic structure but a particular kind of social moment that presumably attracts deontic talk: The management of potentially ‚unacceptable‘ or untoward actions (taking the last bread roll at breakfast, making a disallowed move during a board game, etc.). Data come from a multi-language parallel video corpus of everyday social interaction in English, German, Italian, and Polish. Here, we focus on moments in which one person sanctions another’s behavior as unacceptable. Using interactional-linguistic methods (Couper-Kuhlen/Selting 2018), we examine similarities and differences across these four languages in the use of modal verbs as part of such sanctioning attempts. First results suggest that modal verbs are not as common in the sanctioning of misconduct as one might expect. Across the four languages, only between 10%–20% of relevant sequences involve a modal verb. Most of the time, in this context, speakers achieve deontic meaning in other ways (e.g., infinitives such as German nicht so schmatzen, ‚no smacking‘). This raises the question what exactly modal verbs, on those relatively rare occasions when they are used, contribute to the accomplishment of deontic meaning. The reported study pursues this question in two ways: 1) By considering similarities across languages in the ways that modal verbs interact with other (verbal) means in the sanctioning of misconduct.; 2) By considering differences across languages in the use of modal verbs. Here, we find that the relevant modal verbs are used similarly in some activity contexts (enforcing rules during board games), but less so in other activity contexts (mundane situations with no codified rules). In sum, the presented study adds to cross-linguistically grounded knowledge about deontic meaning and its relationships to linguistics structures.
Besides English, Afrikaans is considered “the [Germanic] language which deviates grammatically the farthest from the others” (Harbert 2007: 17). But how exactly do we measure “grammatical deviation”, and how deviant is Afrikaans really if we compare it not just to other standard languages but also to non-standard varieties? The present contribution aims to address those questions combining functional-typological and dialectometric perspectives. We first select data for 28 Germanic varieties showing vastly different speaker numbers, grades of standardisation and amounts of language contact. Based on 48 (micro)typological variables from syntax, morphology and phonology, we perform cluster analysis and multidimensional scaling and present ways of visualizing and interpreting the results. Inter alia, the analyses show a major divide between Continental West Germanic and North Germanic (as might be expected) and they also identify a number of outliers, including English and pidgin and creole languages such as Russenorsk or Rabaul Creole German. Afrikaans appears to cluster with the other West Germanic languages rather than the outliers. Within West Germanic, however, it does indeed emerge as rather deviant and, according to our metric, it is, for example, typologically closer to other high-contact varieties such as Yiddish than it is to Dutch.
This presentation deals with collaborative turn-sequences (Lerner 2004), a syntactically coherent unit of talk that is jointly formulated by at least two speakers, in Czech and German everyday conversations. Based on conversation analysis (e.g., Schegloff 2007) and a multimodal approach to social interaction (e.g., Deppermann/Streeck 2018), we aim at comparing recurrent patterns and action types within co-constructional sequences in both languages. The practice of co-constructing turns-at-talk has been described for typologically different languages, especially for English (e.g., Lerner 1996, 2004), but also for languages such as Japanese (Hayashi 2003) or Finnish (Helasvuo 2004). For German, various forms and functions of co-constructions have already been investigated (e.g., Brenning 2015); for Czech, a detailed, interactionally based description is still pending (but see some initial observations in, e.g., Hoffmannová/Homoláč/Mrázková (eds.) 2019). Although the existence of co-constructions in different languages points to a cross-linguistic conversational practice, few explicitly comparative studies exist (see, e.g., Lerner/Takagi 1999, for English and Japanese). The language pair Czech-German has mainly been studied with respect to language contact and without specifically considering spoken language or complex conversational sequences (e.g., Nekula/Šichová/Valdrová 2013). Therefore, our second aim is to sketch out a first comparison of co-constructional sequences in German and Czech, thereby contributing to the growing field of comparative and cross-linguistic studies within conversation analysis (e.g., Betz et al. (eds.) 2021; Dingemanse/Enfield 2015; Sidnell (ed.) 2009). More specifically, we will present three main sequential patterns of co-constructional sequences, focusing on the type of action a second speaker carries out by completing a first speaker’s possibly incomplete turn-at-talk, and on how the initial speaker then responds to
this suggested completion (Lerner 2004). Excerpts from video recordings of Czech and German ordinary conversations will illustrate these recurrent co-constructional sequence types, i.e., offering help during word searches (see example 1 above), displaying understanding, or claiming independent knowledge. The third objective of this paper is to underline the participants’ orientation to similar interactional problems, solved by specific syntactic and/or lexical formats in Czech and German. Considering the more recent focus on the embodied dimension of co-constructional practices (e.g., Dressel 2020), we will also investigate the multimodal formatting of a started utterance as more or less “permeable” (Lerner 1996) for co-participant completion, the participants’ mutual embodied orientation, and possible embodied responses to others’ turn-completions (such as head nods or eyebrow flashes, cf. De Stefani 2021). More generally, this contribution reflects on the possibilities and challenges of a cross-linguistic comparison of complex multimodal sequences.
Poetic diction routinely involves two complementary classes of features: (i) parallelisms, i.e. repetitive patterns (rhyme, metre, alliteration, etc.) that enhance the predictability of upcoming words, and (ii) poetic deviations that challenge standard expectations/predictions regarding regular word form and order. The present study investigated how these two prediction-modulating fundamentals of poetic diction affect the cognitive processing and aesthetic evaluation of poems, humoristic couplets and proverbs. We developed quantitative measures of these two groups of text features. Across the three text genres, higher deviation scores reduced both comprehensibility and aesthetic liking whereas higher parallelism scores enhanced these. The positive effects of parallelism are significantly stronger than the concurrent negative effects of the features of deviation. These results are in accord with the hypothesis that art reception involves an interplay of prediction errors and prediction error minimization, with the latter paving the way for processing fluency and aesthetic liking.
Ways out of the dictionary: hyperlinks to other sources in German and African online dictionaries
(2023)
This study examines a number of German and African online dictionaries to see how they make use of the possibility of linking to external sources (e.g. other dictionaries, encyclopaedias, or even corpus data). The article investigates which hyperlinks occur at which places in the word articles and how these are presented to the dictionary users. This is done against the background of metalexicographic considerations on the planning of outer features and the mediostructure in online dictionaries as well as different categorizations of hyperlinks in online reference works. The results show that retro-digitized dictionaries make virtually no use of hyperlinks to external sources. Genuine online dictionaries, on the other hand, do, but often in a form that needs improvement, since, for example, explanations of dictionary-external links are not always found in the user guide and their design is different even within a dictionary.
Introducing Interactive Grammar: How to Develop Language Competence with Research-based Learning
(2023)
We present the implementation of an interactive e-learning platform for both classroom study and self-study, that helps developing German language competence – vocabulary, spelling, and grammar – on various levels and for everyday life applications. The LernGrammis portal addresses school and highschool students, (prospective) teachers, and L2 learners of German equally, each with appropriate educational content and interactive components. It thus offers the digital networking infrastructure for education a unique, freely available and scientifically based learning resource. Applying the innovative concept of „Research-based Learning (RBL)“, LernGrammis provides teachers with ideas for lesson planning, and learners with dedicated modules to develop new skills through exploring authentic language resources and by this means answering customised low-threshold research questions. Using proven practical examples, we demonstrate the approach, its strengths and possibilities, as well as initial user feedback evaluation results.
In this presentation I show first results from an ongoing study about syntactic complexity of sanctioning turns in spoken language. This study is part of a larger project on sanctioning of misconduct in social interaction in different European languages (English, German, Italian and Polish). For the study I use video recordings of different everyday settings (family breakfasts, board game interactions and car rides) with three or four participants. These data come from the Parallel European Corpus of Informal Interaction (Kornfeld/Küttner/Zinken 2023; Küttner et al. submitted). I focus on sanctioning turns with more than one turn-constructional unit (see among others for TCUs: Sacks/Schegloff/Jefferson 1974; Clayman 2013). The study asks how often TCUs are linked to each other in the different languages, for what function, and how language diversity enters into this. Note that complex sanctioning turns do not always come as complex sentences.
The International Comparable Corpus (ICC) (Kirk/Čermáková 2017; Čermáková et al. 2021) is an open initiative which aims to improve the empirical basis for contrastive linguistics by compiling comparable corpora for many languages and making them as freely available as possible as well as providing tools with which they can easily be queried and analysed. In this contribution we present the first release of written language parts of the ICC which includes corpora for Chinese, Czech, English, German, Irish (partly), and Norwegian. Each of the released corpora contains 400k words distributed over 14 different text categories according to the ICC specifications. Our poster covers the design basics of the ICC, its TEI encoding, a demonstration of using the ICC via different query tools, and an outlook on future plans.
Similar to the European Reference Corpus EuReCo (Kupietz et al. 2020), ICC follows the approach of reusing existing linguistic resources wherever possible in order to cover as many languages as possible with realistic effort in as short a time as possible. In contrast to EuReCo, however, comparable corpus pairs are not defined dynamically in the usage phase, but the compositions of the corpora are fixed in the ICC design. The approaches are thus complementary in this respect. The design principles and composition of the ICC are based on those of the International Corpus of English (ICE) (Greenbaum (ed.) 1996), with the deviation that the ICC includes the additional text category blog post and excludes spoken legal texts (see Čermáková et al. 2021 for details). ICC’s fixed-design approach has the advantage that all single-language corpora in the ICC have the same composition with respect to the selected text types and that this guarantees that the selected broad spectrum of potential influencing variables for linguistic variation is always represented. The disadvantage, however, is that this can only be achieved for quite small corpora and that the generalisability of comparative findings based on the ICC corpora will often need to be checked on larger monolingual corpora or translation corpora (Čermáková/Ebeling/Oksefjell Ebeling forthcoming). Arguing that such issues with comparability and representativeness are inevitable, in one way or the other, and need to be dealt with, our poster will discuss and exemplify the text selections in more detail.
The issue: We discuss (declarative) prepositional object clauses (PO-clauses) in the West Germanic languages Dutch (NL), German (DE), and English (EN). In Dutch and German, PO-clauses occur with a prepositional proform (=PPF, Dutch: ervan, erover, etc.; German: drauf/darauf, drüber/darüber, etc.). This proform is optional with some verbs (1). In English, by contrast, P embeds a clausal complement in the case of gerunds or indirect questions (2), however, P is obligatorily absent when the embedded CP is a that-clause in its base positionv(3a). However, when the that-clause is passivized or topicalized, the stranded P is obligatory (3b). Given this scenario, we will address the following questions: i) Are there structural differences between PO-clauses with a P/PPF and those in which the P/PPF is optionally or obligatorily omitted? ii) In particular, do PO-clauses without P/PPF structurally coincide with direct object (=DO) clauses? iii) To what extent are case and nominal properties of clauses relevant? We use wh-extraction as a relevant test for such differences.
Previous research: Based on pronominalization and topicalization data in German and Dutch, PO-clauses are different from DO-clauses independent of the presence of the PPF (see, e.g., Breindl 1989; Zifonun/Hoffmann/Strecker 1997; Berman 2003; Broekhuis/Corver 2015 and references therein) (4,5). English pronominalization and topicalization data (3b) appear to point in the same direction (Fischer 1997; Berman 2003; Delicado Cantero 2013). However, the obligatory absence of P before that-clauses in base position indicates a convergence with DO-clauses.
Experimental evidence: To provide further evidence to these questions we tested PO-clauses in all three languages for long wh-extraction, which is usually possible for DO-clauses in English and Dutch, and in German for southern regional varieties. For German and Dutch we conducted rating studies using the thermometer method (Featherston 2008). Each study contained two sets of sentences: the first set tested long wh-extraction with regular DO-clauses (6). The second set tested wh-extraction from PO-clauses with and without PPFs (7), respectively. The results show no significant difference in extraction with PO-clauses whether or not the PPF was present even for those speakers who otherwise accept long-distance extraction in German. This supports a uniform analysis of PO-clauses with and without the PPF in contrast to DO-clauses. For English we tested extraction with verbs that select for PP-objects in two configurations: V+that-clause and V+P-gerund (8) in comparison to sentences without extraction. Participants rated sentences on a scale of 1 (unnatural) to 7 (natural). We included the gerund for English as this is a regular alternative for such objects. The results show that extraction is licit in both configurations. This suggests that English PO-clauses are different from German and Dutch PO-clauses: They rather behave as DO-clauses allowing for extraction. Note though, that the availability of extraction from P+gerund also shows that PPs are not islands for extraction in English. Overall, this shows that there is a split between English vs. German/Dutch PO-clauses when the P/PPF is absent. While these clauses behave like PO-clauses in the latter languages, extraction does not show a difference between DO- and PO-clauses in English. We will discuss the results in relation to the questions i)–iii) above.
This conversation analytic study compares the use of negation particles in spoken German and Persian, namely nein/nee and na. While these particles have a range of functions in both languages (Ghaderi 2022; Imo 2017), their use in response to news remains understudied. We focus on nein/nee and na in two sequential contexts: (i) after prior disconfirmations (Extract (a)) and (ii) in response to either solicited or unsolicited informings (see Extracts (b) and (c), respectively). In both contexts, nein/nee and na mark unexpectedness and open up an opportunity space for more, but they do so in different ways and with different outcomes. Nein/nee- and na-turns after disconfirming, often minimal responses to first-position confirmable turns mark the prior as unexpected (or even contrasting with the nein/nee/na-speaker’s expectations) and thus as expandable/accountable (cf. Ford 2001; Gubina/Betz 2021). Nein/nee/na-turns after informings (e.g., announcements that display a story teller’s negative emotional stance) differ not only in sequential position but also in prosodic realization. They can be either falling or rising, but all are characterized by marked prosody, i.e., lengthening, very low onset, smiling or breathy voice, or high overall pitch. Through position and turn design features, such nein/nee- and na-turns not only mark a prior turn as counter to (normative) expectations, but may also display the speaker’s affective stance and affiliate with the affective stance of the prior interactant. By comparing the use of nein/nee and na in German and Persian in the two functions illustrated in Extracts (a) and (b/c), we will show (i) how nein/nee- and na-turns shape interactional trajectories after responsive actions and (ii) what role the particles play in managing news and stance-taking as well as epistemic and affective positioning. Apart from revealing similarities in the use of German and Persian negation particles, the results of our crosslinguistic comparison will demonstrate that even if different languages have similar practices for specific actions, the use of these practices is language- and culture-specific. This means that even similar practices in different languages have their own “collateral effects” (Sidnell/Enfield 2012), linguistic and prosodic characteristic features, and, at least sometimes, consequences for social actions accomplished in the specific language (e.g., Dingemanse/Blythe/Dirksmeyer 2014; Evans/Levinson 2009; Floyd/Rossi/Enfield (eds.) 2020; Fox et al. 2009). Our study uses the method of Conversation Analysis (Sidnell/Stivers (eds.) 2013) and draws on more than 80 hours of audio and video recordings of spontaneous interactions (co-present, via video link, and on the telephone) in everyday and institutional contexts.
Any bilingual dictionary is contrastive by nature, as it documents linguistic information between language pairs. However, the design and compilation of most bilingual dictionaries is often no more than mere lists of lexical or semantic equivalents. In internet forums, one can observe a huge interest in acquiring relevant knowledge about specific lexical items or pairs that are prone to comparison in a more comprehensive manner as they may pose lexical semantic challenges. In particular, these often concern easily confused pairs (e.g. false friends or paronyms) and new terms increasingly travelling between languages in news and social media (Šetka-Čilić/Ilić Plauc 2021). With regard to English and German, the fundamental comparative principles upon which contrastive guides should be build are either absent, or specialised contrastive dictionaries simply do not exist, e.g. comprehensive descriptive resources for false friends, paronyms, protologisms or neologisms (see Gouws/Prinsloo/de Schryver 2004). As a result, users turn to electronic resources such as Google translate, blogs and language forums for help. For example, it is English words such as muscular which have two German translations options.
These are two confusables muskulär and muskulös both of which exhibit a different semantic profile. German sensitiv/sensibel and their English formal counterparts sensitive/sensible are false friends. However, these terms are highly polysemous in both languages and have semantic features in common. Their full meaning spectrum is hardly captured in bilingual dictionaries to allow for a full comparison. Translating protologisms such as German Doppelwumms as well as more established new words is one of the most challenging problems. Currently, German neologisms such as Klimakleber are translated as climate glue (instead of climate activist glueing him-/herself onto objects) by online tools, simply causing mistakes and contextual distortion. Most challenges users face today are well-known (e.g. Rets 2016). New terms are often unregistered in dictionaries and it is often impossible to make appropriate choices between two or more (commonly misused) words between two languages (e.g. Benzehra 2007). These are all relevant problems to translators and language learners alike (e.g González Ribao 2019).
This paper calls for the implication of insights from contrastive lexicology into modern bilingual lexicography. To turn dictionaries into valuable resources and in order to create productive strategies in a learning environment, the practice of writing dictionaries requires a critical re-assessment. Furthermore, the full potential of electronic contrastive resources needs to be recognised and put into practice. After all, monolingual German lexicography has started to reflect on how users’ needs can be accounted for in specific comparative linguistic situations. Some of these ideas can be comfortably extended to bilingual reference guides. On the one hand, this paper will deliver a critical account of some English-German/German-English dictionaries and touch on the shortcomings of contemporary bilingual lexicography. On the other hand, with the help of fictitious resources I will demonstrate contrastive structures as focal points of consultations which answer some of the more frequent language questions more reliably. Among others, I will explain how we need to build user-friendly dictionaries to allow for translating false friends or easily confusable words from the source language into its target language efficiently. With regard to neologisms, I will show how discursive descriptions and definitions that are more elaborate can support language learners to learn about necessary extra-linguistic knowledge. Overall, this could improve the role of specialised dictionaries in the teaching or translating process (cf. Miliç/Sadri/Glušac 2019).
In many countries of the world, perspectives on gender equality and racism have changed in recent decades. One result has been more attention being devoted to traces of androcentric and racist language in society. This also affects dictionaries. In lexicography there are discussions about whether or to what extent social asymmetries are inscribed in dictionaries and if this is still acceptable. The issue of the nature of description plays an important role in this discussion. If sexist usages are often found in language use, i.e. in the corpus data on which the dictionary is based, does the dictionary also have to show them? How is this, in turn, compatible with the normative power of dictionaries? Do dictionaries contribute to the perpetuation of gender stereotypes by showcasing them under the banner of descriptive principles? And what roles do lexicographers play in this process? The article deals with these questions on the basis of individual lexicographical examples and current discussions in the lexicographic and public community.
In G, E, I, and H there are constructions with accusative NPs being the external argument of an infinitival, (1) to (4). In P these accusative NPs can only co-occur with an adjectival participle, (5), a construction also occurring in E, (6). The talk compares the syntactic and semantic structure of these constructions focussing on the syntactic category of the nonfinite clause, the status of the accusative NP, the status of the infinitive, restructuring effects, and embedding predicates (including aspect).
i. As to G, E, I, and H, the infinitival clause is regarded as a TP, i.e., a small clause. Its accusative NP and infinitival predicate form a unit – [4], [12], [8]. The AcI denotes, according to [4], an eventuality, which prevents it from being negated. Its subject is case marked by the matrix predicate, either by ECM or subject-to-object raising – [9] and [10]. AcI-constructions can show clause union effects, (7). H additionally allows Dative subjects in infinitive clauses, the latter only being licensed by impersonal predicates and co-occurring with an agreeing infinitive, (8a), – [3]. In case there is no agreeing infinitive, the Dative NP is the experiencer of the matrix clause, (8b). As for Italian, it allows Nominative subject NPs in the infinitive clause, (9a, b).
ii. As to P, small clause constructions differ structurally from E, G, I and H ones – [6], [7]. P small clauses are realizable by copula constructions with verbal być ‘be’ pronominal to ‘it’, (10), or “dual” copula elements, (cooccurrence of a pronominal and a verbal element, [1]), varying with respect to selectional restrictions (part of speech or case within complement phrases, extraction possibilities, [1]). The P counterpart to the AcI-constructions is the secondary predication over an accusative object via an adjectival present participle, (5), (11) and (12). The adjectival participle construction is systematically paraphrasable via clauses introduced by jak ‘how’ (11’) and (12’). In Polish, adjectival phrases like recytującego wiersz ‘reciting’, (11), and wracającego z podróży ‘returning’, (12), clearly function as adjuncts of the accusative object go ‘him’. In our talk, we will compare this P view to languages with typical AcI-constructions, where the AcI-clause is standardly analyzed as a complement of a matrix verb.
A central goal of linguistics is to understand the diverse ways in which human language can be organized (Gibson et al. 2019; Lupyan/Dale 2016). In our contribution, we present results of a large scale cross-linguistic analysis of the statistical structure of written language (Koplenig/Wolfer/Meyer 2023) we approach this question from an information-theoretic perspective. To this end, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections, so-called corpora, consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of un. To this end, we have trained a language model on more than 6,500 different documents as represented in 41 parallel/multilingual corpora consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Figure 1 shows that our database covers a large variety of different text types, e.g. religious texts, legalese texts, subtitles for various movies and talks, newspaper texts, web crawls, Wikipedia articles, or translated example sentences from a free collaborative online database. Furthermore, we use word frequency information from the Crúbadán project that aims at creating text corpora for a large number of (especially under-resourced) languages (Scannell 2007). We statistically infer the entropy rate of each language model as an information-theoretic index of (un)predictability/complexity (Schürmann/Grassberger 1996; Takahira/Tanaka-Ishii/Dębowski 2016). Equipped with this database and information-theoretic estimation framework, we first evaluate the so-called ‘equi-complexity hypothesis’, the idea that all languages are equally complex (Sampson 2009). We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. This constitutes evidence against the equi-complexity hypothesis from an information-theoretic perspective. We then present, discuss and evaluate evidence for a complexity-efficiency trade-off that unexpectedly emerged when we analysed our database: high-entropy languages tend to need fewer symbols to encode messages and vice versa. Given that, from an information theoretic point of view, the message length quantifies efficiency – the shorter the encoded message the higher the efficiency (Gibson et al. 2019) – this indicates that human languages trade off efficiency against complexity. More explicitly, a higher average amount of choice/uncertainty per produced/received symbol is compensated by a shorter average message length. Finally, we present results that could point toward the idea that the absolute amount of information in parallel texts is invariant across different languages.
Interactants who encounter co-participant conduct which they find to be socio-normatively problematic or troublesome are faced with a range of choices. First and foremost, this includes the issue of whether to directly address it, or to simply ‘let it pass’ (at least for now) (Emerson/Messinger 1977). In the case of the former, the issue then becomes how to address it. Across the various ways in which participants can pragmatically engage with what they perceive to be transgressive or untoward behavior (e.g., Pomerantz 1978; Schegloff 1988b; Dersley/Wootton 2000; Günthner 2000; Bolden/Robinson 2011; Potter/Hepburn 2020; see also Rodriguez 2022), they sometimes meta-pragmatically formulate the co-participant’s doings in terms of specific actions. Such action descriptions are necessarily selective (Sacks 1963; Schegloff 1972, 1988a; Sidnell/Barnes 2013): They foreground certain aspects of the co-participant’s conduct, while backgrounding others, and thus contribute to publically construeing the formulated conduct in particular ways (Jayyusi 1993), viz. as socio-normatively problematic, transgressive or untoward, and interactionally accountable (Robinson 2016; Sidnell 2017).
It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
It is a ubiquitous phenomenon of everyday interaction that participants confront their co-participants for behaviour that they assess as undesirable or in some other way untoward. In a set of video data of informal interaction from the PECII corpus (Parallel European Corpus of Informal Interaction), cases of such sanctions have been collected in English, German, Italian and Polish data. This study presents work in progress and focuses on interrogatively formatted sanctions, in particular on non-polar interrogatives. It has already been shown that interrogatives can do much more than ask questions (Huddleston 1994). They can also function as directives (Lindström et al. 2017) or, more specifically, as requests (Curl/Drew 2008), as invitations (Margutti/Galatolo 2018) or reproaches (Klattenberg 2021), among others. What makes them interesting for cross-linguistic comparison is that the four languages that are considered provide different morphological and (morpho-)syntactical ressources for the realization of interrogative phrases. For example, German provides the option of building in the modal particle denn that reveals a previous lack of clarity and obliges the co-participant(s) to deliver the missing information (Deppermann 2009). Of course, the other three languages have modal particles, too (e.g. allora in Italian or though in English), but they do not seem to convey the same semantic and interactional qualities as denn. From an interactional point of view, one could think that interrogatives are a typical and effective way of solliciting accounts, since formally they open up a conditionally relevant space for an answer or a
reaction. But as the data shows, this does not guarantee that they are actually responded to. Another relevant aspect in the context of sanctions is that the interrogative format seems to carry a certain ‚openness‘ that might be seen as a mitigating effect and thus provides an interesting point of comparison with other mitigating devices. This study uses the methods of conversation analysis and interactional linguistics. It is based on a collection of 148 interrogative sanctions (out of which 84 are non-polar interrogatives) covering the four languages. I draw on coded data from roughly 1000 cases to get a first overall idea of how the interrogative format might differ from other formats, and how it might interrelate with specific features – for example, if subsequently an account is delivered. Going more into depth, the interrogative sanctions will then be analyzed with respect to their formal design (e.g. polar questions vs. content questions vs. tag questions, Rossano 2010; Hayano 2013) and to their pragmatic implications. I also analyze reactions to such sanctions – both formally (cf. Enfield et al. 2019, 279) and, again, from an interactional perspective (e.g. acceptance/compliance vs. challenging/defiance; Kent 2012; Cekaite 2020). A more detailed zooming in on the sequential unfolding of some particularly interesting
instances of sanctioning interrogatives will make the picture complete.
Contrastive analysis of climate-related neologisms registered in GermanN and French Wikipedia
(2023)
Neologisms represent new social norms, tendencies, controversies and attitudes. They denote new or changed concepts which are constantly being negotiated between different members of the discourse community (Wodak 2022 and Catalano/Waugh (eds.) 2020). Neologisms help to identify new communicative patterns and narratives which illustrate different strings of discourse in everyday life. In recent years, many neologisms relating to the subject of the environment and climate have been emerging around the world mainly due to dominant discussions on climate change and the movement “Fridays for Future”. In German, for example, neologisms such as Klimakleber, klimaresilient and globaler Streik and in French neologisms such as éco-anxiété, justice climatique and écocitoyen could be observed. These neologisms occur in many domains of life, for example in politics, media and also in advertising, which means that “l’importance croissante des enjeux environnementaux dans les discours politiques, médiatiques et publicitaires” (Balnat/Gérard 2022, p. 22) can be identified. However, it is not only the occurrence of environment- or climate-related topics that is increasing, but also the rising polarisation of the public debate. The polarisation within public discourse is based on the fact that there are opposing positions which are represented by new or recently relevant terms such as activistes du climat (or Klimaaktivisten) and climatosceptiques (or Klimaskeptiker) (Balnat/Gérard 2022, p. 22). Due to different identifications with one or the other side, one can also speak of an “affrontement idéologique” (Balnat/Gérard 2022, p. 23). 1 The explosive nature and the high complexity of the debate on climate and the environmental issues mean that many words are naturally unfamiliar to people. This is especially true with regard to neologisms. In addition, it is often not only the new word itself but also the signified concept that is initially unknown. When people then look up words, they often do so on the Internet. Wikipedia as a “free encyclopedia” (Wikipedia 2023) is particularly well suited as an object of study with regard to neologisms, since factual knowledge is given special attention there. Furthermore, this reference guide is perceived as a regular source of agreed and common knowledge on all sorts of subjects. Hence, the descriptions found here represent social agreement on controversial terms and discussions to some degree. In this paper, German and French neologisms from the subject area of climate and environment will be examined primarily in Wikipedia, but also in the neighbouring resource Wiktionary,2 which is “a collaborative project to produce a free-content multilingual dictionary” (Wiktionary 2023). Since Wikipedia and Wiktionary are available in French and in German, 21010. International Contrastive Linguistics Conference (ICLC) both are equally suitable for the contrastive analysis. Thus, Wikipedia articles which are accessible in both languages (e.g. Klimanotstand and État d›urgence climatique) or Wikipedia articles about similar events and phenomena (e.g. Letzte Generation and Dernière Rénovation) will be compared. For example, we will have a closer look at other new terms specifying different thematic aspects of the discourse of climate and environment. We will mainly refer to those lexical items which can be found in the respective articles in both languages. Special emphasis will be on overlaps and differences, thematic foci, speaker’s positions and evaluative terms.
Our everyday lives in any social community are shaped by rules (e.g., Roughley 2019; Schmidt/Rakoczy 2019). Rules (in a broad sense) are interactionally negotiated, monitored, enforced, and serve as an ‘orientation value‘ in social life. If someone‘s behavior is treated as norm-violating or problematic in certain way, it may be therefore confronted. Confronting interlocutors can immediately stop, modify, or retrospectively reprimand the misconduct of others in a moralizing manner. Such confrontations of a problem behavior occur commonly in informal interactions. On the basis of our corpus, specifically in informal interactions at the table, I observed that, for example, in Polish, German and British English, direct confrontations occur on average at least once every three minutes. Participants design these actions in a variety of ways, but like everything in interaction, the design is not arbitrary (Sacks 1984; Enfield/Sidnell 2019). A recurrent feature of such turns is connecting misconduct to some more general concepts. It is evident from the data that e.g. speakers of German and Polish use ‘generally valid statements’ in problematic moments (cf. Küttner/Vatanen/Zinken 2022) to reach the closure of the problem sequence, also specifically dealing there with distribution of deontic and epistemic rights (Rogowska in prep.). I ask, when and for what purpose generality, that is, abstracting from a concrete behaviour, is used as a tool while confronting others. The focus is on sequential and linguistic features of abstracting in confronting moments in language comparison. What are the methods to achieve abstraction: i) defocusing the confronted, specific agent (cf. Zinken et al. 2021; Siewierska 2008), e.g. nur derjenige der dran ist der darf die bedingungen für den handel stellen (only the one whose turn it is may set the conditions for the trade); using ii) extreme case formulations (Pomerantz 1986), e.g. na siostrę zawsze można liczyć (you can always count on a sister); iii) referring to stable character traits, e.g. Matylda bardzo chetne by podala. (.) Ona jest taka skora do pomocy (Matylda would be very happy to pass (it to you). (.) She is so eager to help); or iv) broader categorizing of the given referent, e.g. do not build (.) do do not build do not build swastikas (when a) German guy is filming us? Sometimes, even several locus of abstraction are combined in the same turn. Can we identify language-specific and cross-linguistic patterns? What are the interactional consequences: enforcing a compliant behavior in the future, eliciting an apology or cognitively simplifying complex problems? From a comparative perspective, I ask whether going beyond the here-and-now while confronting others is a practice that unites speakers across languages and is thus a human cognitive strategy to display normativity. This ongoing study is based on new comparable data from four European languages from informal interaction during activities around the table (Kornfeld/Küttner/Zinken 2023; Küttner et al. in prep.). The phenomenon was coded systematically in each of the four languages as part of a larger, quantitatively oriented study with different questions (Küttner et al. submitted). In the talk, I will show exemplarily Polish and German evidence. I use the methods of Conversation Analysis (Sidnell/Stivers (eds.) 2012) and Interactional Linguistics (Imo/Lanwer 2019).
Following the successes of the ninth conference in 2022 held in the wonderful Santiago de Compostela, Spain, we are pleased to present the proceedings of the 10th edition of International Conference on CMC and Social Media Corpora for the Humanities (CMC-2023). The focal point of
the conference is to investigate the collection, annotation, processing, and analysis of corpora of computer-mediated communication (CMC) and social media.
Our goal is to serve as the meeting place for a wide variety of language-oriented investigations into CMC and social media from the fields of linguistics, philology, communication sciences, media
studies, and social sciences, as well as corpus and computational linguistics, language technology, textual technology, and machine learning.
This year’s event is the largest so far with 45 accepted submissions: 32 papers and 13 poster presentations, each of which were reviewed by members of our ever-growing scientific committee. The contributions were presented in five sessions of two or three streams, and a single poster session. The talks in these proceedings cover a wide range of topics, including the corpora construction, digital identities, digital knowledge-building, digitally-mediated interaction, features
of digitally-mediated communication, and multimodality in digital spaces.
As part of the conference, we were delighted to include two invited talks: an international keynote speech by Unn Røyneland from the University of Oslo, Norway, on the practices and perceptions of
researching dialect writing in social media, and a national keynote speech by Tatjana Scheffler from the Ruhr-University of Bochum on analysing individual linguistic variability in social media and
constructing corpora from this data. Additionally, participants could take part in a workshop on processing audio data for corpus linguistic analysis. This volume contains abstracts of the invited talks, short papers of oral presentations, and abstracts of posters presented at the conference.
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
The shortening of linguistic expressions naturally involves some sort of correspondence between short forms and (some portion of) the respective full forms. Based mostly on data from English and Hebrew this article explores the hypothesis that such correspondence concerns necessary sameness of symbolic form, referring either to graphemic or to a specific level of phonological representation. That level indicates a degree of abstractness defined by language-specific contrastiveness (i.e. “phonemic”). Reference to written form can be shown to be highly systematic in certain contexts, including cases where full forms consist of multiple stems. Specific asymmetries pertaining to the targeting of material by correspondence (e.g. initial vs. non-initial position) appear to be alike for both types of representation, a claim supported by a study based on a nomenclature strictly confined to writing (chemical element symbols).
When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research purposes, however, other criteria also play a role – not least sufficient speed to process the data in an acceptable amount of time. In this paper we evaluate several state of the art tokenization tools for German – including our own – with regard to theses criteria. We conclude that while not all tools are applicable in this setting, no compromises regarding quality need to be made.
Words and their usages are in many cases closely related to or embedded in social, cultural, technical and ideological contexts. This does not only apply to individual words and specific senses, but to many vocabulary zones as well. Moreover, the development of words is often related to aspects of socio-cultural evolution in a broad sense. In this paper I will have a look at traditional dictionaries and digital lexical systems focussing on the question how they deal with socio-cultural and discourse-related aspects of word usage. I will also propose a number of suggestions how future digital lexical systems might be enriched in this respect.
The public as linguistic authority: Why users turn to internet forums to differentiate between words
(2022)
This paper addresses the question of why we face unsatisfactory German dictionary entries when looking up and comparing two similar lexical terms that are loan words, new words, (near) synonyms, or confusables. It explains how users are aware of existing reference works but still search or post on language forums, often after consulting a dictionary and experiencing a range of dictionary based problems. Firstly, these dictionary based difficulties will be scrutinised in more detail with respect to content, function, presentation, and the language of definitions. Entries documenting loan words and commonly confused pairs from different lexical reference resources serve as examples to show the short comings. Secondly, I will explain why learning about your target group involves studying discussion forums. Forums are a valuable source for detailed user studies, enabling the examination of different communicative needs, concrete linguistic questions, speakers’ intuitions, and people’s reactions to posts and comments. Thirdly, with the help of two examples I will describe how the study of chats and forums had a major impact on the development of a recently compiled German dictionary of confusables. Finally, that same problem solving approach is applied to the idea of a future dictionary of neologisms and their synonyms.
eThis paper first attempts a state-of-the art overview of what is known about women in the history of lexicography up to the early twentieth century. It then focusses more closely on the German and German-English lexicographical traditions to 1900, examining them from three different perspectives (following Russell’s 2018 study of women in English lexicography): women as users and dedicatees of dictionaries; women as contributors to and compilers of lexicographical works; and (in a very preliminary way) women and female sexuality as represented in German/English bilingual dictionaries of the eighteenth and early nineteenth centuries. Russell (2018) was able to identify some 24 dictionaries invoking women as patrons, dedicatees or potential users before 1700, and some 150 works in English lexicography by women between 1500 and 1900, besides the contribution of hundreds of women as supporters and helpers, not least as unpaid readers and sub-editors for the Oxford English Dictionary. Equivalent research in other languages is lacking, but this paper presents some of the known examples of women as lexicographers. The evidence tends to support Russell’s finding for English, that women were more likely to find a place in lexicography outside the mainstream: sometimes in a more private sphere (like Hester Piozzi); often in bilingual lexicography (such as Margrethe Thiele, working on a Danish-French dictionary), including missionary and or colonizing activity (such as Cinie Louw in Africa, Daisy Bates in Australia); and in dialect description (Coronedi Berti in Italy, Luisa Lacal and María Moliner in Spain). Within the German-speaking context, women who participated in lexicographical work themselves are hard to identify before the late nineteenth century, though those few women who did have access to education were often engaged in language learning, including translation activity, and they were likely users of bilingual and multilingual dictionaries. Christian Ludwig’s (1706) English-German dictionary – the first of its kind – was dedicated to the Electoral Princess Sophia of Hanover. Elizabeth Weir may have been the first named female compiler of a German dictionary, with her bilingual New German Dictionary (1888). Rather better known are the cases of Agathe Lasch and Luise Pusch, who, as pioneering women in the field of German linguistics, ultimately led major lexicographical projects documenting German regional varieties in the first half of the twentieth century (Middle Low German and Hamburgish in the case of Lasch; the Hessisch Nassau dialect dictionary in the case of Berthold). In the light of existing research on gender and sexuality in the history of English lexicography (e. g. Iamartino 2010; Turton 2019), I conclude with a preliminary exploration how woman and sexuality have been represented in dictionaries of German and English, taking the words Hure and woman in bilingual German-English dictionaries of the eighteenth and nineteenth centuries as my case studies.
This paper focuses on the treatment of culture bound lexical items in a novel type of online learner’s dictionary model, the Phrase Based Active Dictionary (PAD). A PAD has a strong phraseological orientation: each meaning of a word is exclusively defined in a typical phraseological context. After introducing the relevant theory of realia in translation studies, we develop a broader notion of culture specific lexical items which is more apt to serve the purposes of learner’s lexicography and thus to satisfy the needs of a larger and often undefined target group. We discuss the treatment of such words and expressions in common English learner’s dictionaries and then present various excerpts from PAD entries in English, German, and Italian which display different strategies for coping with cultural contents in the lexicon. Our aim is to demonstrate that the phraseological approach at the core of the PAD model turns out to be extremely important to convey cultural knowledge in a suitable way for users to fully grasp cultural implications in language.
In foreign language teaching the use of dictionaries, especially bilingual, has always been related to the hypotheses concerning the relationship between the native language (L1) and second language acquisition method. If the bilingual dictionary was an obvious tool in the grammar-translation method, it was banned from the classroom in the direct, audiolingual and audiovisual methods. Also in the communicative method, foreign language learners are discouraged from using a dictionary. Its use should not obstruct the goals of communicatively oriented foreign language learning – a view still held by many foreign language teachers. Nevertheless, the reality has been different: Foreign language learners have always used dictionaries, even if they no longer possess a print dictionary and mainly use online resources and applications. Dictionaries and online resources will continue to play an important role in the future. In the Council of Europe’s language policy, with its emphasis on multilingualism and lifelong learning, the adequate use of reference tools as a strategic skill is highlighted. In several European countries, educational guidelines refer to the use of dictionaries in the context of media literacy, both in mother tongue and foreign language teaching. Not only is their adequate use important, but so too is the comparison, assessment and evaluation of the information presented, in order to develop Language Awareness and Language Learning Awareness. This is good news. However, does this mean that dictionaries are actually used in class? What role do dictionaries play in foreign language teaching in schools and universities? Are foreign language learners in the digital era really competent users? And how competent are their teachers? Are they familiar with the current (online) dictionary landscape? Can they support their students? After a more in-depth study of the status quo of dictionary use by foreign language learners and teachers and the gap between their needs and the reality, this contribution discusses the challenges facing lexicographers and meta-lexicographers and what educational policy measures are necessary to make their efforts worthwhile in turning foreign language learners – and their teachers – into competent users in a multilingual and digital world.
The aim of this paper is to show how lexicographical choices reflect ideological thinking, singled out by Eagleton (2007) into the strategies of rationalizing, legitimating, action orienting, unifying, naturalizing and universalizing. It will be carried out by examining two twenty first century editions of each of the five English monolingual learner’s dictionaries published by Cambridge, Collins, Longman, Macmillan, and Oxford. The synchronic and diachronic analyses of the dictionaries and their different editions at the macro structural level (the wordlists) and at the micro structural level (the definitional styles) will show how the reduction and change of data, derived from heterogeneous social and cultural contexts of language use, to abstract essential forms, involves decisions about the central and peripheral aspects of the lexicon and the meaning of words.
Applying terminological methods to lexicography helps lexicographers deal with the terms occurring in general language dictionaries, especially when it comes to writing the definitions of concepts belonging to special fields. In the context of the lexicographic work of the Dicionário da Língua Portuguesa, an updated digital version of the last Academia das Ciências de Lisboa’ dictionary published in 2001, we have assumed that terminology – in its dual dimension, both linguistic and conceptual – and lexicography are complementary in their methodological approaches. Both disciplines deal with lexical items, which can be lexical units or terms. In this paper, we apply terminological methods to improve the treatment of terms in general language dictionaries and to write definitions as a form of achieving more precision and accuracy, and also to specify the domains to which they belong. Additionally, we highlight the consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are term identification markers instead of a flat list of domains. The need to create and make available structured, organised and interoperable lexicographic resources has led us to follow a path in which the application of standards and best practices of treating and representing specialised lexicographic content are fundamental requirements.
In a multilingual and multicultural society, dictionaries play an important role to enhance interlingual communication. A diversity of languages and different levels of dictionary culture demand innovative lexicographic approaches to establish a dictionary landscape that responds to the needs of the various speech communities. Focusing on the South African situation this paper discusses some aspects of a few dictionaries that contributed to an improvement of the local dictionary landscape. Using the metaphors of bridges, dykes and sluice gates it is shown how lexicographers need a balanced approach in their lemma selection and treatment. Whilst a too strong prescriptive approach can be to the detriment of the macrostructural selection, a lack of regulatory criteria could easily lead to a data overload. The lexicographer should strive to give a reflection of the actual language use and enable the users to retrieve the information that can satisfy their specific communication and cognitive needs. Such lexicographic products will enrich and improve the dictionary landscape.
Phonesthemes (Firth 1930) are sublexical constructions that have an effect on the lexico-grammatical continuum: they are recurring form-meaning associations that occur more often than by chance but not systematically (Abramova/Fernandez/Sangati 2013). Phonesthemes have been shown (Bergen 2004) to affect psycholinguistic language processing; they organise the mental lexicon. Phonesthemes appear over time to emerge as driven by language use as indexical rather than purely iconic constructions in the lexicon (Smith 2016; Bergen 2004; Flaksman 2020). Phonesthemes are acknowledged in construction morphology (Audring/Booij/Jackendoff 2017) as motivational schemas. Some phonesthemes also tend to have lexicographic acknowledgment, as shown by etymologist Liberman (2010), although this relevance and cohesion appears to be highly variable as we will show in this paper.
This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene.
Recent years have seen a growing interest in linguistic phenomena that challenge the received division of labour between lexicon and grammar, and hence often fall through the cracks of traditional dictionaries and grammars. Such phenomena call for novel, pattern based types of linguistic reference works (see various papers in Herbst 2019). The present paper introduces one such resource: MAP (“Musterbank argumentmarkierender Präpositionen”), a web based corpus linguistic patternbank of prepositional argument structure constructions in German. The paper gives an overview of the design and functionality of the MAP prototype currently developed at the Leibniz Institute for the German Language in Mannheim. We give a brief account of the data and our analytic workflow, illustrate the descriptions that make up the resource and sketch available options for querying it for specific lexical, semantic and structural properties of the data.
In this paper, we present LexMeta, a metadata model for the description of human-readable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexicographical works. The scope of the proposed model, however, is broader, aiming at the exchange of metadata with catalogues of Language Resources and Technologies and addressing a wider community of researchers besides lexicographers. For the definition of the LexMeta core classes and properties, we deploy widely used RDF vocabularies, mainly Meta-Share, a metadata model for Language Resources and Technologies, and FRBR, a model for bibliographic records.
This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.