Refine
Year of publication
Document Type
- Part of a Book (31)
- Article (22)
- Conference Proceeding (8)
- Book (1)
Language
- English (62) (remove)
Has Fulltext
- yes (62) (remove)
Keywords
- Computerlinguistik (14)
- Deutsch (12)
- Natürliche Sprache (8)
- Korpus <Linguistik> (7)
- Annotation (5)
- Automatische Sprachanalyse (5)
- Konversationsanalyse (5)
- Maschinelles Lernen (5)
- Gesprochene Sprache (4)
- Information Extraction (4)
Publicationstate
- Postprint (30)
- Veröffentlichungsversion (18)
- Zweitveröffentlichung (13)
- Preprint (2)
Reviewstate
- Peer-Review (24)
- (Verlags)-Lektorat (20)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
- Springer (62) (remove)
This contribution explores the relationship between the English CEFR (Common European Framework of Reference for Languages) vocabulary levels and user interest in English Wiktionary entries. User interest was operationalized through the number of views of these entries in Wikimedia server logs covering a period of four years (2019–2022). Our findings reveal a significant relationship between CEFR levels and user interest: entries classified at lower CEFR levels tend to attract more views, which suggests a greater user interest in more basic vocabulary. A multiple regression model controlling for other known or potential factors affecting interest: corpus frequency, polysemy, word prevalence, and age of acquisition confirmed that lower CEFR levels attract significantly more views even after taking into account the other predictors. These findings highlight the importance of CEFR levels in predicting which words users are likely to look up, with implications for lexicography and the development of language learning materials.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
Neologisms, i.e., new words or meanings, are finding their way into everyday language use all the time. In the process, already existing elements of a language are recombined or linguistic material from other languages is borrowed. But are borrowed neologisms accepted similarly well by the speech community as neologisms that were formed from “native” material? We investigate this question based on neologisms in German. Building on the corresponding results of a corpus study, we test the hypothesis of whether “native” neologisms are more readily accepted than those borrowed from English. To do so, we use a psycholinguistic experimental paradigm that allows us to estimate the degree of uncertainty of the participants based on the mouse trajectories of their responses. Unexpectedly, our results suggest that the neologisms borrowed from English are accepted more frequently, more quickly, and more easily than the “native” ones. These effects, however, are restricted to people born after 1980, the so-called millenials. We propose potential explanations for this mismatch between corpus results and experimental data and argue, among other things, for a reinterpretation of previous corpus studies.
Theater rehearsals are (usually) confronted with the problem of having to transform a written text into an audio-visual, situated and temporal performance. Our contribution focuses on the emergence and stabilization of a gestural form as a solution for embodying a certain aesthetic concept which is derived from the script. This process involves instructions and negotiations, making the process of stabilization publicly and thus intersubjectively accessible. As scenes are repeatedly rehearsed, rehearsals are perspicuous settings for tracking interactional histories. Based on videotaped professional theatre interactions in Germany, we focus on consecutive instances of rehearsing the same scene and trace the interactional history of a particular gesture. This gesture is used by the director to instruct the actors to play a particular aspect of a scene adopting a certain aesthetic concept. Stabilization requires the emergence of shared knowledge. We will show the practices by which shared knowledge is established over time during the rehearsal process and, in turn, how the accumulation of knowledge contributes to a change in the interactional practices themselves. Specifically, we show how a gesture emerges in the process of developing and embodying an aesthetic concept, and how this gesture eventually becomes a sign that refers to and evokes accumulated knowledge. At the same time, we show how this accumulated knowledge changes the instructional activities in the rehearsal process. Our study contributes to the overall understanding of knowledge accumulation in interaction in general and in theater rehearsals in particular. At the same time, it is devoted to the central importance of gestures in theater, which are both a means and a product of theatrical staging.
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
Mock fiction is a genre of humorous, fictional narratives. It is pervasive in adolescents’ peer-group interaction. Building on a corpus of informal peer-group interaction among 14 to 17 year-old German adolescents, it is shown how mock fiction is used to sanction identity-claims of peer-group co-members that are taken to be inadequate by the teller of a mock fiction. Mock fiction exposes and ridicules those claims by fictional exaggeration. Mock fiction is an indirect, yet sometimes even highly abusive means for criticizing and negotiating identities and statuses of peer-group members. The analysis shows how mock fiction is collaboratively produced, how it is used to convey criticism and to negotiate social norms indirectly, and how, in addition, it allows for performative self-positioning of the tellers as skilled, entertaining tellers and socio-psychological diagnosticians.
Digital research infrastructures can be divided into four categories: large equipment, IT infrastructure, social infrastructure, and information infrastructure. Modern research institutions often employ both IT infrastructure and information infrastructure, such as databases or large-scale research data. In addition, information infrastructure depends to some extent on IT infrastructure. In this paper, we discuss the IT, information, and legal infrastructure issues that research institutions face.
The lexicography of German
(2020)
This chapter discusses the main dictionaries of the German language as it is spoken and written in Germany, and also German as it is spoken and written in Austria, Switzerland, the eastern fringes of Belgium, and South Tyrol. It also briefly describes Pennsylvania German. Corpora and other language resources used in German dictionary-making are also presented. Finally, there is a discussion of some current issues in German lexicography, as well as future prospects.