Refine
Year of publication
Document Type
- Part of a Book (60)
- Article (58)
- Conference Proceeding (18)
Language
- English (136) (remove)
Is part of the Bibliography
- yes (136) (remove)
Keywords
- Deutsch (44)
- Korpus <Linguistik> (36)
- Konversationsanalyse (20)
- Interaktion (19)
- Gesprochene Sprache (14)
- Annotation (9)
- German (9)
- Digital Humanities (7)
- Grammatik (7)
- Pragmatik (7)
Publicationstate
- Zweitveröffentlichung (136) (remove)
Reviewstate
- Peer-Review (86)
- (Verlags)-Lektorat (43)
- (Verlags-)Lektorat (1)
- (Verlags-)lektorat (1)
- Peer review (1)
Publisher
- de Gruyter (17)
- European Language Resources Association (14)
- Benjamins (13)
- Cambridge University Press (5)
- Editura Academiei Române (5)
- Oxford University Press (5)
- De Gruyter Mouton (4)
- Elsevier (4)
- Routledge, Taylor & Francis Group (4)
- Buske (3)
"What makes this so complicated?" On the value of disorienting dilemmas in language instruction
(2017)
A "polyglottal" speech synthesis - modifications for a replica of Kempelen's speaking machine
(2019)
This paper argues that there is a correlation between functional and purely grammatical patterning in language, yet the nature of this correlation has to be explored. This claim is based on the results of a corpus-driven study of the Slavic aspect, drawing on the socalled Distributional Hypothesis. According to the East-West Theory of the Slavic aspect, there is a broad east-west isogloss dividing the Slavic languages into an eastern group and a western group. There are also two transitional zones in the north and south, which share some properties with each group (Dickey 2000; Barentsen 1998, 2008). The East-West Theory uses concepts of cognitive grammar such as totality and temporal definiteness, and is based on various parameters of aspectual usage in discourse, including contexts such as habituals, general factuals, historical (narrative) present, performatives, sequenced events in the past etc. The purpose of the above-mentioned study is to challenge the semantic approach to the Slavic aspect by comparing the perfective and imperfective verbal aspect on the basis of purely grammatical co-occurrence patterns (see also Janda & Lyashevskaya 2011). The study focused on three Slavic languages: Russian, which, following the East-West Theory, belongs to the eastern group, Czech, which belongs to the western group, and Polish, which is considered as transitional in its aspectual patterning.
We present a new resource for German causal language, with annotations in context for verbs, nouns and adpositions. Our dataset includes 4,390 annotated instances for more than 150 different triggers. The annotation scheme distinguishes three different types of causal events (CONSEQUENCE, MOTIVATION, PURPOSE). We also provide annotations for semantic roles, i.e. of the cause and effect for the causal event as well as the actor and affected party, if present. In the paper, we present inter-annotator agreement scores for our dataset and discuss problems for annotating causal language. Finally, we present experiments where we frame causal annotation as a sequence labelling problem and report baseline results for the prediciton of causal arguments and for predicting different types of causation.
In a number of languages, agreement in specificational copular sentences can or must be with the second of the two nominals, even when it is the first that occupies the canonical subject position. Béjar & Kahnemuyipour (2017) show that Persian and Eastern Armenian are two such languages. They then argue that ‘NP2 agreement’ occurs because the nominal in subject position (NP1) is not accessible to an external probe. It follows that actual agreement with NP1 should never be possible: the alternative to NP2 agreement should be ‘default’ agreement. We show that this prediction is false. In addition to showing that English has NP1, not default, agreement, we present new data from Icelandic, a language with rich agreement morphology, including cases that involve ‘plurale tantum’ nominals as NP1. These allow us to control for any confound from the fact that typically in a specificational sentence with two nominals differing in number, it is NP2 that is plural. We show that even in this case, the alternative to agreement with NP2 is agreement with NP1, not a default. Hence, we conclude that whatever the correct analysis of specificational sentences turns out to be, it must not predict obligatory failure of NP1 agreement.
We question the growing consensus in the literature that European Americans behave as a homogenous pan-ethnic coalition of voters. Seemingly below the radar of scholarship on voting groups in American politics, we identify a group of white voters that behaves differently from others: German Americans, the largest ethnic group, regionally concentrated in the ‘Swinging Midwest’. Using county level voting returns, ancestry group information from the American Community Survey (ACS), current survey data and historical census data going back as early as 1910, we provide evidence for a partisan and a non-partisan pathway that motivated German Americans to vote for Trump in 2016: a historically grown association with the Republican Party and an acquired taste for isolationist attitudes that mobilizes non-partisan German Americans to support isolationist candidates. Our findings indicate that European American experiences of migration and integration still echo into the political arena of today.
In this paper, we present an overview of freely available web applications providing online access to spoken language corpora. We explore and discuss various solutions with which the corpus providers and corpus platform developers address the needs of researchers who are working with spoken language. The paper aims to contribute to the long-overdue exchange and discussion of methods and best practices in the design of online access to spoken language corpora.
Action ascription can be understood from two broad perspectives. On one view, it refers to the ways in which actions constitute categories by which members make sense of their world, and forms a key foundation for holding others accountable for their conduct. On another view, it refers to the ways in which we accountably respond to the actions of others, thereby accomplishing sequential versions of meaningful social experience. In short, action ascription can be understood as matter of categorisation of prior actions or responding in ways that are sequentially fitted to prior actions, or both. In this chapter, we review different theoretical approaches to action ascription that have developed in the field, as well as the key constituents and resources of action ascription that have been identified in conversation analytic research, before going on to discuss how action ascription can itself be considered a form of social action.
In the first volume of Corpus Linguistics and Linguistic Theory, Gries (2005. Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1(2). doi:10.1515/cllt.2005.1.2.277. http://www.degruyter.com/view//cllt.2005.1.issue-2/cllt.2005.1.2.277/cllt.2005.1.2.277.xml: 285) asked whether corpus linguists should abandon null-hypothesis significance testing. In this paper, I want to revive this discussion by defending the argument that the assumptions that allow inferences about a given population – in this case about the studied languages – based on results observed in a sample – in this case a collection of naturally occurring language data – are not fulfilled. As a consequence, corpus linguists should indeed abandon null-hypothesis significance testing.
Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
Repeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability. To address this, we investigate the extent to which repetition and predictability influence the articulation of the frequent German word “sie” [zi] (they). We find that articulatory variability is proportional to speaking rate and the duration of [zi], and that overall variability decreases as [zi] is repeated during the experiment. Lower variability is also observed as the conditional probability of [zi] increases, and the greatest reduction in variability occurs during the execution of the vocalic target of [i]. These results indicate that practice can produce observable differences in the articulation of even the most common gestures used in speech.
This paper describes a rule-based approach to detect direct speech without the help of any quotation markers. As datasets fictional and non-fictional texts were used. Our evaluation shows that the results appear stable throughout different datasets in the fictional domain and are comparable to the results achieved in related work.
This study builds on a large body of work on the use of linguistic forms for requests in social interaction. Using Conversation Analysis / Interactional Linguistics, this study explores the use of two recurrent linguistic formats for requesting in spoken German – simple interrogatives ('do you do ..?') and kannst du VP? ('can you do..?') interrogatives. Based on a corpus of video-recorded, naturally occurring data of mundane data, this study demonstrates one of the interactional factors that is relevant for the choice between alternative interrogative request formats in spoken German – recipient's embodied availability before and during the request initiation. It is shown that simple interrogatives are used to request an action from a recipient who is either available or involved in their own project, which, however, does not have to be suspended or interrupted for the compliance with the request. In contrast, kannst du VP? interrogatives occur in environments in which the recipient is already engaged in a project that must be suspended in order to grant the request.
Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society
(2020)
This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.
Canadian heritage German across three generations: A diary-based study of language shift in action
(2019)
It is well known that migration has an effect on language use and language choice. If the language of origin is maintained after migration, it tends to change in the new contact setting. Often, migrants shift to the new majority language within few generations. The current paper examines a diary corpus containing data from three generations of one German-Canadian family, ranging from 1867 to 1909, and covering the second to fourth generation after immigration. The paper analyzes changes that can be observed between the generations, with respect to the language system as well as to the individuals’ decision on language choice. The data not only offer insight into the dynamics of acquiring a written register of a heritage language, and the eventual shift to the majority language. They also allow us to identify different linguistic profiles of heritage speakers within one community. It is discussed how these profiles can be linked to the individuals’ family backgrounds and how the combination of these backgrounds may have contributed to giving up the heritage language in favor of the majority language.
This paper studies how the turn-design of a highly recurrent type of action changes over time. Based on a corpus of video-recordings of German driving lessons, we consider one type of instructions and analyze how the same instructional action is produced by the same speaker (the instructor) for the same addressee (the student) in consecutive trials of a learning task. We found that instructions become increasingly shorter, indexical and syntactically less complex; interactional sequences become more condensed and activities designed to secure mutual understanding become rarer. This study shows how larger temporal frameworks of interpersonal interactional histories which range beyond the interactional sequence impinge on the recipient-design of turns and the deployment of multimodal resources in situ.
We present web services implementing a workflow for transcripts of spoken language following TEI guidelines, in particular ISO 24624:2016 "Language resource management - Transcription of spoken language". The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
This paper deals with different types of verbal complementation of the German verb verdienen. It focuses on constructions that have been undergoing a grammaticalization process and thus express deontic modality, as in Sie verdient geliebt zu werden (ʽShe deserves to be lovedʼ) and Sie verdient zu leben (ʽShe deserves to liveʼ) (Diewald, Dekalo & Czicza 2021). These constructions are connected to parallel complementation types with passive and active infinitives containing a correlate es, as in Sie verdient es, geliebt zu werden and Sie verdient es, zu leben, as well as finite clauses with the subordinator dass with and without correlative es, as in Sie verdient, dass sie geliebt wird and Sie verdient es, dass sie geliebt wird. This paper attempts to show a close comparative investigation of these six types of constructions based on their relevant semantic and syntactic properties in terms of clause linkage (Lehmann 1988). We analyze the relevant data retrieved from the DWDS corpus of the 20th century and present an expanded grammaticalization path for verdienen-constructions. The finite complementation with dass is regarded as an example of a separate structural option called “elaboration”. Concerning the use of correlative es, it is shown that it does not have any substantial effect on the grammaticalization of modal verdienen-constructions.
The ubiquity of smartphones has been recognised within conversation analysis as having an impact on conversational structures and on the participants’ interactional involvement. However, most of the previous studies have relied exclusively on video recordings of overall encounters and have not systematically considered what is taking place on the device. Due to the personal nature of smartphones and their small displays, onscreen activities are of limited visibility and are thus potentially opaque for both the co-present participants (“participant opacity”) and the researchers (“analytical opacity”). While opacity can be an inherent feature of smartphones in general, analytical opacity might not be desirable for research purposes. This chapter discusses how a recording set-up consisting of static cameras, wearable cameras and dynamic screen captures allowed us to address the analytical opacity of mobile devices. Excerpts from multi-source video data of everyday encounters will illustrate how the combination of multiple perspectives can increase the visibility of interactional phenomena, reveal new analytical objects and improve analytical granularity. More specifically, these examples will emphasise the analytical advantages and challenges of a combined recording set-up with regard to smartphone use as multiactivity, the role of the affordances of the mobile device, and the prototypicality and “naturalness” of the recorded practices.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of word formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DeReKo and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
Our paper examines how bodily behavior contributes to the local meaning of OKAY. We explore the interplay between OKAY as response to informings and narratives and accompanying multimodal resources in German multi-party interaction. Based on informal and institutional conversations, we describe three different uses of OKAY with falling intonation and the recurrent multimodal patterns that are associated with them and that can be characterized as ‘multimodal gestalts’. We show that: 1. OKAY as a claim to sufficient understanding is typically accompanied by upward nodding; 2. OKAY after change-of-state tokens exhibits a recurrent pattern of up- and downward nodding with distinctive timing; and 3. OKAY closing larger activities is associated with gaze-aversion from the prior speaker.
The present paper outlines the projected second part of the Corpus Query Lingua Franca (CQLF) family of standards: CQLF Ontology, which is currently in the process of standardization at the International Standards Organization (ISO), in its Technical Committee 37, Subcommittee 4 (TC37SC4) and its national mirrors. The first part of the family, ISO 24623-1 (henceforth CQLF Metamodel), was successfully adopted as an international standard at the beginning of 2018. The present paper reflects the state of the CQLF Ontology at the moment of submission for the Committee Draft ballot. We provide a brief overview of the CQLF Metamodel, present the assumptions and aims of the CQLF Ontology, its basic structure, and its potential extended applications. The full ontology is expected to emerge from a community process, starting from an initial version created by the authors of the present paper.
Corpus REDEWIEDERGABE
(2020)
This article presents the corpus REDEWIEDERGABE, a German-language historical corpus with detailed annotations for speech, thought and writing representation (ST&WR). With approximately 490,000 tokens, it is the largest resource of its kind. It can be used to answer literary and linguistic research questions and serve as training material for machine learning. This paper describes the composition of the corpus and the annotation structure, discusses some methodological decisions and gives basic statistics about the forms of ST&WR found in this corpus.
In this chapter, we overview the specificity of comparisons made within the perspective of Conversation Analysis (CA), and we position them in relation to other fields. We introduce the analytical mentality, methodology, and procedures of CA, and we show how we used it for the analysis of OKAY in this volume.
This article examines a recurrent format that speakers use for defining ordinary expressions or technical terms. Drawing on data from four different languages - Flemish, French, German, and Italian - it focuses on definitions in which a definiendum is first followed by a negative definitional component (‘definiendum is not X’), and then by a positive definitional component (‘definiendum is Y’). The analysis shows that by employing this format, speakers display sensitivity towards a potential meaning of the definiendum that recipients could have taken to be valid. By negating this meaning, speakers discard this possible, yet unintended understanding. The format serves three distinct interactional purposes: (a) it is used for argumentation, e.g. in discussions and political debates, (b) it works as a resource for imparting knowledge, e.g. in expert talk and instructions, and (c) it is employed, in ordinary conversation, for securing the addressee's correct understanding of a possibly problematic expression. The findings contribute to our understanding of how epistemic claims and displays relate to the turn-constructional and sequential organization of talk. They also show that the much quoted ‘problem of meaning’ is, first and foremost, a participant's problem.
Digital humanities research under United States and European copyright laws. Evolving frameworks
(2021)
This chapter summarizes the current state of copyright laws in the United States and European Union that most affect Digital Humanities research, namely the fair use doctrine in the US and research exceptions in Europe, including the Directive on Copyright in the Digital Single Market, which has been finally adopted in 2019. This summary begins with a description of recent copyright advances most relevant to DH research, and finishes with an analysis of a significant remaining legal hurdle which DH researchers face: how do fair use and research exceptions deal with the critical issue of circumventing technological protection measures (TPM, a.k.a. DRM). Our discussion of the lawful means of obtaining TPM-protected material may contribute to both current DH research and planning decisions and inform future stakeholders and lawmakers of the need to allow TPM circumvention for academic research.
A constructicon, i.e., a structured inventory of constructions, essentially aims at documenting functions of lexical and grammatical constructions. Among other parameters, so-called constructional collo-profiles, as introduced by Herbst (2018, 2020), are conclusive for determining constructional meanings. They provide information on how relevant individual words are for construction slots, they hint at usage preferences of constructions and serve as a helpful indicator for semantic peculiarities of constructions. However, even though collo-profiles constitute an indispensable component of constructicon entries, they pose major challengers for constructicographers: For a constructicographic enterprise it is not feasible to conduct collostructional analyses for hundreds or even thousands of constructions. In this article, we introduce a procedure based on the large language model BERT that allows to predict collo-profiles without having to extensively annotate instances of constructions in a given corpus. Specifically, by discussing the constructions X macht Y ADJP (‘x makes Y ADJ’, e.g. he drives him crazy) and N1 PREP N1 (e.g., bumper to bumper, constructions over constructions), we show how the developed automated system generates collo-profiles based on a limited number of annotated instances. Finally, we place collo-profiles alongside other dimensions of constructional meanings included in the German Constructicon.
Entity framing is the selection of aspects of an entity to promote a particular viewpoint towards that entity. We investigate entity framing of political figures through the use of names and titles in German online discourse, enhancing current research in entity framing through titling and naming that concentrates on English only. We collect tweets that mention prominent German politicians and annotate them for stance. We find that the formality of naming in these tweets correlates positively with their stance. This confirms sociolinguistic observations that naming and titling can have a status-indicating function and suggests that this function is dominant in German tweets mentioning political figures. We also find that this status-indicating function is much weaker in tweets from users that are politically left-leaning than in tweets by right leaning users. This is in line with observations from moral psychology that left-leaning and right-leaning users assign different importance to maintaining social hierarchies.
The idea of this article is to take the immaterial and somehow ethereal nature of aesthetic concepts seriously by asking how aesthetic concepts are negotiated and thus formed in communication. My examples come from theatrical production where aesthetic decisions naturally play a major role. In the given case, an aesthetic concept is introduced with which only the director, but none of the actors is familiar in the beginning of the rehearsals. The concept, Wabi Sabi, comes from Japanese culture. As the whole rehearsal process was video recorded, it is possible to track the process of how the concept is negotiated and acquired over time. So, instead of defining criteria what Wabi Sabi as an aesthetic concept “consists of,” this article seeks to show how the concept is introduced, explained and “used” within a practical context, in this case a theater rehearsal. In contrast to conventional models of aesthetic experience, I am interested in the ways in which an aesthetic concept is configured in and through socially organized interaction, and — vice versa — how that interaction contributes to the situational accomplishment of the same concept. In short: I am interested in the “doing” of aesthetic concepts, especially in “doing Wabi Sabi.”
Older adults are often exposed to elderspeak, a specialized speech register linked with negative outcomes. However, previous research has mainly been conducted in nursing homes without considering multiple contextual conditions. Based on a novel contextually-driven framework, we examined elderspeak in an acute general versus geriatric German hospital setting. Individuallevel information such as cognitive impairment (CI) and audio-recorded data from care interactions between 105 older patients (M = 83.2 years; 49% with severe CI) and 34 registered nurses (M = 38.9 years) were assessed. Psycholinguistic analyses were based on manual coding (k = .85 to k = .97) and computer-assisted procedures. First, diminutives (61%), collective pronouns (70%), and tag questions (97%) were detected. Second, patients’ functional impairment emerged as an important factor for elderspeak. Our study suggests that functional impairment may be a more salient trigger of stereotype activation than CI and that elderspeak deserves more attention in acute hospital settings.
This paper investigates emergent pseudo-coordination in spoken German. In a corpus-based study, seven verbs in the first conjunct are analyzed regarding the degree of semantic bleaching and the development of subjective or aspectual meaning components. Moreover, it is shown that each verb shows distinct tendencies for co-ocurrences, especially with deictic adverbs in the first conjunct and with specific verbs and verb classes in the second conjunct. It is argued that pseudo-coordination is originally motivated by the need for ‘chunking’ in unplanned speech and that it is still prominently used in this function in German, in contrast to languages in which pseudo-coordination is grammaticalized further.
The sentiment polarity of an expression (whether it is perceived as positive, negative or neutral) can be influenced by a number of phenomena, foremost among them negation. Apart from closed-class negation words like no, not or without, negation can also be caused by so-called polarity shifters. These are content words, such as verbs, nouns or adjectives, that shift polarities in their opposite direction, e. g. abandoned in “abandoned hope” or alleviate in “alleviate pain”. Many polarity shifters can affect both positive and negative polar expressions, shifting them towards the opposing polarity. However, other shifters are restricted to a single shifting direction. Recoup shifts negative to positive in “recoup your losses”, but does not affect the positive polarity of fortune in “recoup a fortune”. Existing polarity shifter lexica only specify whether a word can, in general, cause shifting, but they do not specify when this is limited to one shifting direction. To address this issue we introduce a supervised classifier that determines the shifting direction of shifters. This classifier uses both resource-driven features, such as WordNet relations, and data-driven features like in-context polarity conflicts. Using this classifier we enhance the largest available polarity shifter lexicon.
To date, little is known about prosodic accommodation and its conversational functions in instances of overlapping talk in conversation. A major conversational action that happens in overlap is turn competition. It is not known whether participants accommodate prosodic parameters locally in the overlapped turn (initialisation) or access a repertoire of prosodic patterns that refer to general prosodic parameter norms (normalisation) when competing for the turn in overlap. This paper investigates the initialisation and normalisation of fundamental frequency (f0) and assesses its role as a resource for turn competition in overlap. We drew instances of overlapping talk from a corpus of conversational multi-party interactions in British English. We annotated the overlaps on a competitiveness scale and categorised them by overlap onset position and conversational function. We automatically extracted f0 parameters from the speech signal and processed them into f0 accommodation features that represent the normalising or the initialising use of f0. Using decision tree classification we found that f0 accommodation is only relevant as a turn competitive resource in overlaps that start clearly before a speaker transition. In this turn context, we found that normalising and initialising f0 features can both be relevant turn competitive resources. Their deployment depends on the conversational function of overlap.
We present a fine-grained NER annotations scheme with 30 labels and apply it to German data. Building on the OntoNotes 5.0 NER inventory, our scheme is adapted for a corpus of transcripts of biographic interviews by adding categories for AGE and LAN(guage) and also adding label classes for various numeric and temporal expressions. Applying the scheme to the spoken data as well as a collection of teaser tweets from newspaper sites, we can confirm its generality for both domains, also achieving good inter-annotator agreement. We also show empirically how our inventory relates to the well-established 4-category NER inventory by re-annotating a subset of the GermEval 2014 NER coarse-grained dataset with our fine label inventory. Finally, we use a BERT-based system to establish some baselines for NER tagging on our two new datasets. Global results in in-domain testing are quite high on the two datasets, near what was achieved for the coarse inventory on the CoNLLL2003 data. Cross-domain testing produces much lower results due to the severe domain differences.
Based on the empirical data of 97 fourth-graders from three districts of Braunschweig in Germany, this paper investigates the possibility of changing semantic frames in multilingual communities. The focus of study is the verb field of self-motion. In a free-sorting task involving 52 verbs, Turkish-speaking students, in particular, placed the verbs schleichen (‘to sneak’) and kommen (‘to come’) in the same group. When explaining the perceived similarity they also used the word schleichen (‘to sneak’), in a specific grammatical construction that is not found in Standard German. This paper suggests that semantic frames may change along with grammatical constructions when typologically distinct languages come into close contact.
Starting from early approaches within Generative Grammar in the late 1960s, the article describes and discusses the development of different theoretical frameworks of lexical decomposition of verbs. It presents the major subsequent conceptions of lexical decompositions, namely, Dowty’s approach to lexical decomposition within Montague Semantics, Jackendoff’s Conceptual Semantics, the LCS decompositions emerging from the MIT Lexicon Project, Pustejovsky’s Event Structure Theory, Wierzbicka’s Natural Semantic Metalanguage, Wunderlich’s Lexical Decompositional Grammar, Hale and Kayser’s Lexical Relational Structures, and Distributed Morphology. For each of these approaches, (i) it sketches their origins and motivation, (ii) it describes the general structure of decompositions and their location within the theory, (iii) it explores their explanative value for major phenomena of verb semantics and syntax, (iv) and it briefly evaluates the impact of the theory. Referring to discussions in article 7 [Semantics: Foundations, History and Methods] (Engelberg) Lexical decomposition, a number of theoretical topics are taken up throughout the paper concerning the interpretation of decompositions, the basic inventory of decompositional predicates, the location of decompositions on the different levels of linguistic representation (syntactic, semantic, conceptual), and the role they play for the interfaces between these levels.
This paper discusses German neologisms in the so-called “new-media” and presents a German corpus-based online dictionary of neologisms. Several neological morphemes and lexemes, as well as their meaning will be presented, showing that these new modes of communication are an important source of enrichment of German lexicon.
Present-day German uses two formally different patterns of compounding in N+N compounds. The first combines bare stems (e.g. Tisch+decke ‘tablecloth’) while the second contains an intervening linking element (LE) as in Geburt-s-ort ‘birth-LE-place’. The linked compounding type developed in Early New High German (1350–1650) from phrasal constructions by reanalyzing genitive attributes as first constituents of compounds. The present paper uses corpus data to explore three key stages in this development: In the initial stage, it shows how prenominal non-specific genitive constructions lent themselves to reanalysis due to their functional overlap and formal similarity. Additionally, compounds seem to have replaced not only prenominal genitives, but also structurally different postnominal genitives. In the second stage, the new compounding pattern increases in productivity between 1500 and 1710, especially compared to the older pattern without linking elements. The last stage pertains to changes in spelling practice. It shows that linked compounds were written separately in the beginning. Their gradual graphematic integration into directly connected words was reversed by a century of hyphenation (1650–1750). This is strikingly different from present-day spelling practice and shows that the linked pattern was still perceived as marked.
Between January 2020 and summer 2021, many new words and phrases contributed to the expansion of the German vocabulary in order to enable communication under the new conditions during the corona pandemic. This rapid expansion of vocabulary has most notably affected lexicography as a discipline of applied linguistics. General language dictionaries or terminological dictionaries have quickly reflected on how the German lexicon evolved during the corona pandemic: new entries were added, others were revised. This paper, however, focuses on the ways in which a German (specialized) neologism dictionary project, the "Neologismenwörterbuch" at the "Leibniz Institute for the German Language, Mannheim" published (online, see https://www.owid.de/docs/neo/start.jsp) has chosen to capture and document lexicographic information in a timely manner. Neologisms are (following the definition applied here) lexical units or senses/meanings which emerge in a language community over a specific period of time of language development, which diffuse, are generally accepted as language norms, and which the majority of speakers perceive as new for some time. Thus, the "Neologismenwörterbuch" used to record neologisms only retrospectively, that is after their lexicalization. As a consequence, users of the dictionary were often not able to obtain details on words that were particularly conspicuous at a particular time in a specific discourse, thus raising questions concerning their meaning, correct spelling, etc. This, however, did not imply that the lexicographers of the project had not already collected these words with some preliminary information in a list of candidates for inclusion in an internal database. Therefore, the project started to publish online an index of monitored words including lexical units that had emerged since 2011, for which only time will tell whether they will diffuse and manifest as language norms. This list format was used since April 2020 to also issue a compilation of corona-related neologisms as part of the "Neologismenwörterbuch". In October 2021, this inventory included more than 1.800 Corona-related neologisms, and still, more than 700 candidates in an internal database awaited lexicographic description and inclusion into the online index (see https://www.owid.de/docs/neo/listen/corona.jsp). In this paper many examples are presented to illustrate how new words, new senses and new uses in the context of the Covid-19 pandemic are reflected in the dictionary.
During the second half of the 19th century, extended regions of the South Pacific came to be part of the German colonial empire. The colonial administration included repeated and diverse efforts to implement German as the official language in several settings (administration, government, education) in the colonial areas. Due to unfamiliar sociological and linguistic conditions, to competition with English as a(nother) prestigious colonizer language, and to the short time-span of the German colonial rule, these efforts rendered only little language-related effect. Nevertheless, some linguistic traces remained, and these seem to reflect in what areas language implementation was organized most thoroughly. The study combines two directions of investigation: First, taking a historical approach, legal and otherwise official documents and information are considered in order to understand how the implementation process was planned and (intended to be) carried out. Second, from a linguistic perspective, documented lexical borrowings and other traces of linguis tic contact are identified that can corroborate the historical findings by reflecting a greater effect of contact in such areas where the implementation of German was carried out most strictly. The goal of this paper is, firstly, to trace the political and missionary activities in language planning with regard to German in the colonial Pacific, rather similar to a modem language policy scenario when a new code of prestige or national unity is implemented. Secondly, these activities are evaluated in the face of the outcome that can be observed, in the historical practice as well as in long-term effects of language contact up until today.
Im ’Minimalistischen Programm’ (Chomsky 1995) werden A-Bewegungen (’A- movements’, d.h. N-Hebung, V-Hebung usw.) und A’-Bewegungen (’A-bar- movements’, d.h. Extraposition, VP-Adjunktion, ’scrambling’ usw.) als sehr ungleichwertige Operationen behandelt. Der vorliegende Aufsatz untersucht die distinkten Eigenschaften von A-Bewegungen und A’-Bewegungen anhand von drei Gruppen von Argumenten, nämlich Topikalisierung (Abschnitt 2), Verschiebung schwacher Pronomina (Abschnitt 3) und Verb-Zweit unter der Symmetrie-Annahme (Abschnitt 4). Die Konklusionen daraus sind, daß die hier vertretene Analyse eine Möglichkeit bietet, A-Bewegungen und A’- Bewegungen zu unterscheiden, ohne letztere aus dem Zuständigkeitsbereich der Grammatik zu verbannen.
In this chapter, we will investigate smartphone-based showing sequences in everyday social encounters, that is, moments in which a personal mobile device is used for presenting (audio-)visual content to co-present participants. Despite a growing interest in object-centred sequences and mundane technology use, detailed accounts of the sequential, multimodal, and material dimensions of showing sequences are lacking. Based on video data of social interactions in different languages and on the framework of multimodal interaction analysis, this chapter will explore the link between mobile device use and social practices. We will analyse how smartphone showers and their recipients coordinate the manipulation of a technological object with multiple courses of action, and reflect upon the fundamental complexity of this by-now routine joint activity.
This is an introduction to a special issue of Dictionaries: Journal of the Dictionary Society of North America. It offers a characterization of neology and describes the Globalex-sponsored workshop at which the papers in the issue originated. It provides an overview of the papers, which treat lexicographical neology and neological lexicography in Danish, Dutch, Estonian, Frisian, Greek, Korean, Spanish, and Swahili and address relevant aspects of lexicography in those languages, presenting state-of-the-art research into neology and ideas about modern lexicographic treatment of neologisms in various dictionary types.
Novel formats of construction-based description hold great potential for phenomena that fall through the cracks in traditional kinds of linguistic reference works. On the example of German verb argument structure constructions with a prepositional object, we demonstrate that a construction-based description of such phenomena is superior to existing lexicographic and grammaticographic treatments, but that it also poses a number of new problems. The most fundamental of these relates to the fact that construction-based analyses can be proposed on different levels of abstraction. We illustrate pertinent problems relating to the precise identification of constructional form and meaning and suggest a multi-layered descriptive format for web-based electronic reference constructica that can accommodate these challenges. Semantically, the proposed solution integrates both lumping and splitting perspectives on constructional grain size and permits users to flexibly zoom in and out on individual elements in the resource. Formally, it can capture variation in the number and marking of realised arguments as found in e.g. passives and transitivity alternations. Aspects of the theoretical controversy between Construction Grammar and Valency Theory are addressed where relevant, but our focus is on questions of description and the practical implementation of construction-based analyses in a suitable type of linguistic reference work.
How Do Speakers Define the Meaning of Expressions? The Case of German x heißt y (“x means y”)
(2020)
To secure mutual understanding in interaction, speakers sometimes explain or negotiate expressions. Adopting a conversation analytic and interaction linguistic approach, I examine how participants explain which kinds of expressions in different sequential environments, using the format x heißt y (“x means y”). When speakers use it to clarify technical terms or foreign words that are unfamiliar to co-participants, they often provide a situationally anchored definition that however is rather context-free and therefore transferable to future situations. When they explain common (but indexical, ambiguous, polysemous, or problematic) expressions instead, speakers always design their explanation strongly connected to the local context, building on situational circumstances. I argue that x heißt y definitions in interaction do not meet the requirements of scientific or philosophical definitions but that this is irrelevant for the situational exigencies speakers face.
The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Language (CoRoLa) can be used. A multitude of examples intends to highlight a wide range of interrogation possibilities that CoRoLa opens for different types of users. The querying of CoRoLa displayed here is supported by the KorAP frontend, through the querying language Poliqarp. Interrogations address annotation layers, such as the lexical, morphological and, in the near future, the syntactical layer, as well as the metadata. Other issues discussed are how to build a virtual corpus, how to deal with errors, how to find expressions and how to identify expressions.
Meta-communicative practices are generally reflexive in a fairly obvious sense: Inasmuch as speakers use them to talk about or comment on earlier/subsequent talk, they use language self-reflexively. In this paper, we explore a practice that is reflexive not only in this meta-communicative sense but also in a sequential-interactional one: Prefacing a conversational turn with I was gonna say. We show that the I was gonna say-preface furnishes the following general semantic-pragmatic affordances: (1) It retroactively relates the speaker’s subsequent talk to preceding talk from a co-participant, (2) it embodies a claim to prior, now-preempted, communicative intent with regard to what their co-participant has (just) said/done, (3) it therefore displays its speaker’s orientation to the relevance or the appropriate placement of the action(s) done in their own subsequent talk at an earlier moment in the interaction, and (4) it reflexively re-invokes, or retrieves, this earlier moment as the relevant sequential context for their action(s). We then go on to illustrate how speakers draw on these sequentially reflexive affordances for managing recurrent interactional contingencies in specific sequential environments. The paper ends with a discussion of the role that reflexivity plays in and for the deployment of this practice.
This paper presents experiments on sentence boundary detection in transcripts of spoken dialogues. Segmenting spoken language into sentence-like units is a challenging task, due to disfluencies, ungrammatical or fragmented structures and the lack of punctuation. In addition, one of the main bottlenecks for many NLP applications for spoken language is the small size of the training data, as the transcription and annotation of spoken language is by far more time-consuming and labour-intensive than processing written language. We therefore investigate the benefits of data expansion and transfer learning and test different ML architectures for this task. Our results show that data expansion is not straightforward and even data from the same domain does not always improve results. They also highlight the importance of modelling, i.e. of finding the best architecture and data representation for the task at hand. For the detection of boundaries in spoken language transcripts, we achieve a substantial improvement when framing the boundary detection problem as a sentence pair classification task, as compared to a sequence tagging approach.
Response particles manage intersubjectivity. This conversation analytic study describes German eben (“exactly”). With eben, speaker A locally agrees with the immediately prior turn of B (the “confirmable”) and establishes a second indexical link: A relates B’s confirmable to a position A herself had already displayed (the “anchor”). Through claiming temporal priority, eben speakers treat a just-formulated position as self-evident and mark independence. Further evidence for the three-part structure “anchor-confirmable-eben” that eben sets in motion retrospectively comes from instances where eben speakers supply a missing/opaque anchor via a postpositioned display of independent access. Data are in German with English translation.
Instruction practices in German driving lessons: Differential uses of declaratives and imperatives
(2018)
Building on а corpus of 70 hours of German driving lessons, this paper studies the use of declaratives vs. imperatives for instruction. It shows how these linguistic resources are adapted to different praxeological, temporal and participant-related environments. Declaratives are used for first instructions, task-setting and post- trial discussions. They exhibit complex syntax and do not call for immediate compliance. Their high degree of explicitness conveys how the action is to be carried out. Imperative instructions overwhelmingly correct ongoing actions of students or respond to their failure to produce expected actions. They exhibit minimal argument structure. They are reminders which presuppose that the student monitors the scene and can perform the action unproblematically. They index that requests have to be complied with immediately or even urgently.
This paper discusses the technological and methodological challenges in creating and sharing HAMATAC, the Hamburg Map Task Corpus. The first version of the corpus, consisting of 24 recordings with orthographic transcriptions and metadata, is publicly available. A second version featuring different types of linguistic annotation is in progress. I will describe how the various software tools and data formats of the EXMARaLDA system were used for transcription and multi-level annotation, to compile recordings and transcriptions into a corpus and manage metadata, to publish the corpus, and how they can be used for carrying out corpus queries (KWIC) and analyses. Some recurrent issues in corpus building and sharing and the interaction of technological and methodological aspects will be illustrated using HAMATAC.
Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARIN
(2020)
CLARIN is a European Research Infrastructure providing access to language resources and technologies for researchers in the humanities and social sciences. It supports the use and study of language data in general and aims to increase the potential for comparative research of cultural and societal phenomena across the boundaries of languages and disciplines, all in line with the European agenda for Open Science. Data infrastructures such as CLARIN have recently embarked on the emerging frameworks for the federation of infrastructural services, such as the European Open Science Cloud and the integration of services resulting from multidisciplinary collaboration in federated services for the wider domain of the social sciences and humanities (SSH). In this paper we describe the interoperability requirements that arise through the existing ambitions and the emerging frameworks. The interoperability theme will be addressed at several levels, including organisation and ecosystem, design of workflow services, data curation, performance measurement and collaboration. For each level, some concrete outcomes are described.
This paper asks whether and in which ways managing coordination tasks in traffic involve the accomplishment of intersubjectivity. Taking instances of coordinating passing an obstacle with oncoming traffic as the empirical case, four different practices were found.
1. Intersubjectivity can be presupposed by expecting others to stick to the traffic code and other mutually shared expectations.
2. Intersubjective solutions emerge step by step by mutual responsive-anticipatory adaptation of driving decisions.
3. Intersubjectivity can be accomplished by explicit interactive negotiation of passages.
4. Coordination problems can be solved without relying on intersubjectivity by unilateral, responsive-anticipatory adaptation to others’ behaviors.
In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.
This presentation introduces a new collaborative project: the International Comparable Corpus (ICC) (https://korpus.cz/icc), to be compiled from European national, standard(ised) languages, using the protocols for text categories and their quantities of texts in the International Corpus of English (ICE).
Introduction
(2019)
Introduction
(2023)
This replication study aims to investigate a potential bias toward addition in the German language, building upon previous findings of Winter and colleagues who identified a similar bias in English. Our results confirm a bias in word frequencies and binomial expressions, aligning with these previous findings. However, the analysis of distributional semantics based on word vectors did not yield consistent results for German. Furthermore, our study emphasizes the crucial role of selecting appropriate translational equivalents, highlighting the significance of considering language-specific factors when testing for such biases for languages other than English.
Language attitudes matter; they influence people’s behaviour and decisions. Therefore, it is crucial to learn more about patterns in the way that languages are evaluated. One means of doing so is using a quantitative approach with data representative of a whole population, so that results mirror dispositions at a societal level. This kind of approach is adopted here, with a focus on the situation in Germany. The article consists of two parts. First, I will present some results of a new representative survey on language attitudes in Germany (the Germany Survey 2017). Second, I will show how language attitudes penetrate even seemingly objective data collection processes by examining the German Microcensus. In 2017, for the first time in eighty years, the German Microcensus included a question on language use ‘at home’. Unfortunately, however, the question was clearly tainted by language attitudes instead of being objective. As a result, the Microcensus significantly misrepresents the linguistic reality of different migrant languages spoken in Germany.
Lean syntax: how argument structure is adapted to its interactive, material, and temporal ecology
(2020)
It has often been argued that argument structure in spoken discourse is less complex than in written discourse. This paper argues that lean argument structure, in particular, argument omission, gives evidence of how the production and understanding of linguistic structures is adapted to the interactive, material, and temporal ecology of talk-in-interaction. It is shown how lean argument structure builds on participants' ongoing bodily conduct, joint perceptual salience, joint attention, and their Orientation to expectable next actions within a joint project. The phenomena discusscd in this paper are verb-derived discourse markers and tags, analepsis in responsive actions, and ellipsis in first actions, such as requests and instructions. The study draws from transcripts and audio- and video-recordings of naturally occurring interaction in German from the Research and Teaching Corpus of Spoken German (FOLK).
This paper discusses contemporary societal roles of German in the Baltic states (Latvia, Estonia, Lithuania). Speaker and learner statistics and a summary of sociolinguistic research (Linguistic Landscapes, language learning motivation, language policies, international roles of languages) suggest that German has by far fewer speakers and functions than the national languages, English, and Russian, and it is not a dominant language in the contemporary Baltics anymore. However, German is ahead of ‘any other language’ in terms of users and societal roles as a frequent language in education, of economic relations, as a historical lingua franca, and a language of traditional and new minorities. Highly diverse groups of users and language policy actors form a ‘coalition of interested parties’ which creates niches which guarantee German a frequent use. In the light of the abundance of its functions, the paper suggests the concept ‘additional language of society’ for a variety such as German in the Baltics – since there seems to be no adequate alternative labelling which would do justice to all societal roles. The paper argues that this concept may also be used for languages in similar societal situations and, not least, be useful in language marketing and the promotion of multilingualism.
Theories of lexical decomposition assume that lexical meanings are complex. This complexity is expressed in structured meaning representations that usually consist of predicates, arguments, operators, and other elements of propositional and predicate logic. Lexical decomposition has been used to explain phenomena such as argument linking, selectional restrictions, lexical-semantic relations, scope ambiguities, and the inference behavior of lexical items. The article sketches the early theoretical development from noun-oriented semantic feature theories to verb-oriented complex decompositions. It also deals with a number of theoretical issues, including the controversy between decompositional and atomistic approaches to meaning, the search for semantic primitives, the function of decompositions as definitions, problems concerning the interpretability of decompositions, and the debate about the cognitive status of decompositions.
Little strokes fell great oaks. Creating CoRoLa, the reference corpus of contemporary Romanian
(2019)
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, which reaches its peak with the reference corpus of contemporary Romanian language (CoRoLa). The paper describes decisions behind the kinds of texts collected, as well as processing and annotation steps, highlighting the structure and importance of metadata to the corpus. The reader is also introduced to the three ways in which (s)he can plunge into the rich linguistic data of the corpus, waiting to be discovered. Besides querying the corpus, word embeddings extracted from it are useful to various natural language processing applications and for linguists, when user-friendly interfaces offer them the possibility to exploit the data.
Meaning in interaction
(2024)
This editorial to the Special Issue on “Meaning in Interaction” introduces to the approach of Interactional Semantics, which has been developed over the last years within the framework of Interactional Linguistics. It discusses how “meaning” is understood and approached in this framework and lays out that Interactional Semantics is interested in how participants clarify and negotiate the meanings of the expressions that they are using in social interaction. Commonalities and differences of this approach with other approaches to meaning are flagged, and the intellectual origins and precursors of Interactional Semantics are introduced. The contributions to the Special Issue are located in the larger field of research.
The paper at hand discusses productivity in German compound formation – as a case of morphological variation – from a lexeme-based synchronic perspective. In particular, we focus on groups of compounds with semantically closely related head words, e.g., compounds denoting colors.
Our approach is characterized by a qualitative as well as a quantitative perspective on productivity. Taking the properties of the head lexeme as a starting point and applying corpus-based statistical methods, we try to gain new insights into compound formation, especially into potential factors which govern their productivity. In a first step, we determine the productivity of compounds on the basis of current productivity measures and data from a large corpus of German. In a second step, we try to systematically explain observable differences in productivity.
The approach presented here is one of the first attempts to apply the concept of productivity, which has been predominantly used in the domain of derivation, to compounding. Since compounding is a dominant factor for the expansion of the German lexicon, we assume that our investigation also sheds an important light on the dynamics of the lexicon.
Morphophonological asymmetries in affixation concern systematic correlations between morphological properties of affixes (e.g. combination with bound versus free stems, position relative to stem (suffixes versus prefixes)) and their phonological properties (e.g. stress behaviour). The arguably most insightful approach to capturing relevant asymmetries invokes a notion of affix coherence, first introduced by Dixon in connection with his work on Yidiɲ, a nearly extinct language spoken in Northern Australia. This notion is based on a categorical division of affixes into ones that integrate into the phonological word of the stem and ones that do not. The integration of affixes is envisioned as being fully determined by phonological and morphological structure in a given language and verifiable by diagnostics relevant to phonological word domains (primarily the syllable and the foot structure). The assumption of two types of prosodic domains characterized by integrated versus non-integrated affixes is manifest in consistent asymmetries that pertain to morphophonological, phonological, and phonetic rules. This consistency constitutes compelling evidence for the structure-based analysis of the impact of various affixes on derived words, as opposed to alternative approaches to capturing these effects by associating affixes with diacritics (morpheme versus word boundary, class 1 versus class 2, stratum 1 versus stratum 2). The present entry aims to demonstrate, mostly on the basis of data from Germanic languages, the breadth of the empirical evidence in support of a fundamental role of affix coherence. Moreover, it aims to draw attention to the various implications of affix coherence for modeling relevant generalizations, in particular the necessary reference to a level of phonological representation characterized by a specific degree of abstractness (‘phonemic’).
A large database is a desirable basis for multimodal analysis. The development of more elaborate methods, data banks, and tools for a stronger empirical grounding of multimodal analysis is a prevailing topic within multimodality. Prereq- uisite for this are corpora for multimodal data. Our contribution aims at developing a proposal for gathering and building multimodal corpora of audio-visual social media data, predominantly YouTube data.Our contribution has two parts: First we outline a participation framework which is able to represent the complexity of YouTube communication. To this end we ‘dissect’ the different communicative and multimodal layers YouTube consists of. Besides the Video performance YouTube also integrates comments, social media operators, commercials, and announcements for further YouTube Videos. The data consists of various media and modes and is interactively engaged in various discourses. Hence, it is rather difficult to decide what can be considered as a basic communicative unit (or a ‘turn’) and how it can be mapped. Another decision to be made is which elements are of higher priority than others, thus have to be integrated in an adequate transcription format. We illustrate our conceptual considerations on the example of so-called L e t’s Plays, which are supposed to present and comment Computer gaming processes.The second part is devoted to corpus building. Most previous studies either worked with ad hoc data samples or outlined data mining and data sampling strategies. Our main aim is to delineate in a systematic way and based on the conceptual outline in the first part necessary elements which should be part of a YouTube corpus. To this end we describe in a first Step which components (e.g., the Video itself, the comments, the metadata, etc.) should be captured. ln a second Step we outline why and which relations (e.g., screen appearances, hypertextual struc- tures, etc.) are worth to get part of the corpus. In sum, our contribution aims at outlining a proposal for gathering and systematizing multimodal data, specifically audio-visual social media data, in a corpus derived from a conceptual modeling of important communicative processes of the research object itself.
New exceptions for Text and Data Mining and their possible impact on the CLARIN infrastructure
(2018)
The proposed paper discusses new exceptions for Text and Data Mining that have recently been adopted in some EU Member States, and probably will soon be adopted also at the EU level. These exceptions are of great significance for language scientists, as they exempt those who compile corpora from the obligation to obtain authorisation from rightholders. However, corpora compiled on the basis of such exceptions cannot be freely shared, which in a long run may have serious consequences for Open Science and the functioning of research infrastructure such as CLARIN ERIC.
This article details the process of creating the Nottinghamer Korpus deutscher YouTube-Sprache ('The Nottingham German YouTube Language Corpus' - or NottDeuYTSch corpus) and outlines potential research opportunities. The corpus was compiled to analyse the online language produced by young German-speakers and offers significant opportunity for in-depth research across several linguistic fields including lexis, morphology, syntax, orthography, and conversational and discursive analysis. The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represent an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses. The NottDeuYTSch corpus is available for analysis as part of the German Reference Corpus (DeReKo).
OKAY originates from English, but it is increasingly used across languages. This chapter presents data from 13 languages, illustrating the spectrum of possible uses of OKAY in responding and claiming understanding in contexts of informings. Drawing on a wide range of interaction types from both informal and institutional contexts, including those crucially involving embodied practices, we show how OKAY can be used to (i) claim sufficient understanding, (ii) mark understanding of the prior informing as preliminary or not complete, and (iii) index discrepancy of expectation.
This investigation targets a syntactic phenomenon of German which is commonly referred to as the absentive construction. The absentive is considered a universal grammatical category denoting absence. Its syntax is characterised by the occurrence of an auxiliary or copula verb accompanied by a non‐finite VP containing a main verb. The expression of absence, predicated over the clausal subject, is assumed to be based on a constructional meaning. Reviewing a wide range of syntactic and interpretive properties of this structure in German, we will demonstrate that certain empirical claims about the construction are not well founded and that its seemingly idiosyncratic properties are indeed available for compositional analyses. We will propose a structural analysis of its core syntactic and interpretive properties: The predication expresses the localisation of the subject at the location of the event, denoted by the infinitival verb. The interpretation of absence, then, can be explained by an implicature.
This paper has two distinct but interdependent goals. The empirical and analytical primary goal is to present a detailed overview of the patterns of (syntactico-semantic) argument structure and (morpho-syntactic) argument realization found with clause-embedding predicates in German. In particular, it will elucidate the observable relationships and dependencies between them, with a special focus on prepositional object clauses. The methodological secondary goal is to demonstrate the recently published ZAS Database of Clause-Embedding Predicates and illustrate its usefulness in approaching a concrete research agenda. The goals are aligned with each other because the data on patterns of argument structure and realization were collected using the database, and indeed the relevant questions could not have been investigated in such a thorough and efficient way without it. We will begin in Part 1 with an introduction to the database, its structure, and why and how it was created, before moving in Part 2 to the presentation of the data and analysis of argument structure and argument realization.
Polish żeby under negation
(2021)
The paper addresses two patterns in the distribution of complement clauses headed by the complementizer żeby in Polish related to the presence of sentential negation. It is argued that żeby-clauses with an obligatory negation in the matrix clause, licensed by epistemic verbs, can be treated in terms of negative polarity, with żeby defined as an n-word. Structures with żeby-clauses and an obligatory negation in the embedded clause, licensed by verbs of fear, are argued to be an instance of negative complementation, with żeby specified as a negative complementizer. A uniform lexicalist analysis within the framework of HPSG is provided, employing tools developed to account for Negative Concord in Polish.
Defining groups and affiliating the self and the other with specific social categories is an important part of constructing a colonial conceptualization of societies. Many written documents from the colonial period attest to this practice. The current paper focuses on missionaries’ ways of positioning themselves and others within the colonial context. The German speaking Rheinische Missionsgesellschaft (RMG, Rhenish Mission Society) established mission stations in the Astrolabe Bay area of New Guinea, an area that was under German domination between 1884 and 1914. The paper analyzes how RMG missionaries, by means of language, construct, define, and position different population groups, and it investigates what patterns emerge from these language practices.
Mock fiction is a genre of humorous, fictional narratives. It is pervasive in adolescents’ peer-group interaction. Building on a corpus of informal peer-group interaction among 14 to 17 year-old German adolescents, it is shown how mock fiction is used to sanction identity-claims of peer-group co-members that are taken to be inadequate by the teller of a mock fiction. Mock fiction exposes and ridicules those claims by fictional exaggeration. Mock fiction is an indirect, yet sometimes even highly abusive means for criticizing and negotiating identities and statuses of peer-group members. The analysis shows how mock fiction is collaboratively produced, how it is used to convey criticism and to negotiate social norms indirectly, and how, in addition, it allows for performative self-positioning of the tellers as skilled, entertaining tellers and socio-psychological diagnosticians.
This paper studies practices of indexing discrepant assumptions accomplished by turn-constructional units with ich dachte ('I thought') in German talk-in-interaction. Building on the analysis of 141 instances from the corpus FOLK, we identify three sequential environments in which ich dachte is used to index that an assumption which a speaker (has) held contrasts with some other, contextually salient assumption. We show that practices which have been studied for English I thought are also routinely used in German: ich dachte is a means to manage epistemic incongruencies and to contrast an incorrect with a correct assumption in narratives. In addition, ich dachte is also used to account for the speaker's own prior actions which may have looked problematic because they built on misunderstandings which the speaker only discovered later. Moreover, ich dachte-practices may also be used to create comic effects by reporting an earlier, absurd assumption. The practices are discussed with regard to their role in regaining common ground, in managing relationships, in maintaining the identity of a rational actor, and in terms of their exploitation for other conversational interests. Special attention is paid to how co-occurring linguistic features, and sequential and pragmatic factors, account for local interpretations of ich dachte.
This paper deals with a specific type of lexeme, namely binary preposition-noun combinations containing temporal references like am Ende [at (the) end] or für Sekunden [for seconds]. The main characteristic of these combinations is the recurrent internal zero gap. Despite the fact that the omission of the determiner can often be explained by grammatical rules, the zero gaps indicate a higher degree of lexicalization. Therefore, we interpret these expressions as minimal phraseological units with holistic meanings and functions. The corpusdriven exploration of typical context patterns (e.g. using collocation profiles and the lexpan slot filler analysis) shows that a) even such minimal expressions are based on semi-abstract schemes and b) temporal expressions can also fulfill modal or discursive functions, usually with fuzzy borders and overlapping structures. In the case of modalization or pragmatization one can regard such PNs as distinct lexicon entries.
Privacy by Design (also referred to as Data Protection by Design) is an approach in which solutions and mechanisms addressing privacy and data protection are embedded through the entire project lifecycle, from the early design stage, rather than just added as an additional layer to the final product. Formulated in the 1990 by the Privacy Commissionner of Ontario, the principle of Privacy by Design has been discussed by institutions and policymakers on both sides of the Atlantic, and mentioned already in the 1995 EU Data Protection Directive (95/46/EC). More recently, Privacy by Design was introduced as one of the requirements of the General Data Protection Regulation (GDPR), obliging data controllers to define and adopt, already at the conception phase, appropriate measures and safeguards to implement data protection principles and protect the rights of the data subject. Failing to meet this obligation may result in a hefty fine, as it was the case in the Uniontrad decision by the French Data Protection Authority (CNIL). The ambition of the proposed paper is to analyse the practical meaning of Privacy by Design in the context of Language Resources, and propose measures and safeguards that can be implemented by the community to ensure respect of this principle.
As immigration and mobility increases, so do interactions between people from different linguistic backgrounds. Yet while linguistic diversity offers many benefits, it also comes with a number of challenges. In seven empirical articles and one commentary, this Special Issue addresses some of the most significant language challenges facing researchers in the 21st century: the power language has to form and perpetuate stereotypes, the contribution language makes to intersectional identities, and the role of language in shaping intergroup relations. By presenting work that aims to shed light on some of these issues, the goal of this Special Issue is to (a) highlight language as integral to social processes and (b) inspire researchers to address the challenges we face. To keep pace with the world’s constantly evolving linguistic landscape, it is essential that we make progress toward harnessing language’s power in ways that benefit 21st century globalized societies.
Information theory can be used to assess how efficiently a message is transmitted on the basis of different symbolic systems. In this paper, I estimate the information-theoretic efficiency of written language for parallel text data in more than 1000 different languages, both on the level of characters and on the level of words as information encoding units. The main results show that (i) the median efficiency is ∼29% on the character level and ∼45% on the word level, (ii) efficiency on both levels is strongly correlated with each other and (iii) efficiency tends to be higher for languages with more speakers.
Null subjects (NSs) have been a central research topic in generative syntax ever since the 1980s. This chapter considers the situation of German NSs both from a dialectological and from a diachronic perspective and attempts to reconstruct a direct line concerning the licensing conditions of pro-drop from Old High German (OHG) through Middle High German (MHG) and Early New High German (ENHG) to current dialects of New High German (NHG). Particularly, we will argue that German changed from a consistent, yet asymmetric pro-drop language to a partial, but symmetric one. In order to demonstrate that this development took place and the steps involved, we survey the existing empirical evidence and introduce new data.
This paper investigates the long-term diachronic development of the perfect and preterite tenses in German and provides a novel analysis by supplementing Reichenbach’s (1947) classical theory of tense by the notion of underspecification. Based on a newly compiled parallel corpus spanning the entire documented history of German, we show that the development in question is cyclic: It starts out with only one tense form (preterite) compatible with both current relevance and narrative past readings in (early) Old High German and, via three intermediate stages, arrives at only one tense form again (perfect) compatible with the same readings in modern Upper German dialects. We propose that in order to capture all attested stages we must allow tenses to be unspecified for R (reference time), with R merely being inferred pragmatically. We then propose that the transitions between the different stages can be explained by the interplay between semantics and pragmatics.
The article addresses Solution-Oriented Questions (SOQs) as an interactional practice for relationship management in psychodiagnostic interviews. Therapeutic alliance results from the concordance of alignment, as willingness to cooperate regarding common goals, and of affiliation, as relationship based upon trust. SOQs particularly allow for both: They are situated at the end of a troublesome topic area, which is linked to low agency on the patient’s side, and they reveal understanding of and interest in the patient. Following the paradigm of Conversation Analysis and German Gesprächsanalyse this paper analyzes the design and functions of SOQs as a means for securing and enhancing the relationship in the process of therapy. Our data comprise 15 videotaped first interviews following the manual of the Operationalized Psychodynamic Diagnostics. The analyses refer to all SOQs found but will be illustrated by means of a single conversation.
The first International Summer Institute for Interactional Linguistics (henceforth ISIIL) took place from July 18 to 23 at the Leibniz-Institute for the German Language (IDS) in Mannheim, Germany. The local organizers, Arnulf Deppermann and Alexandra Gubina, collaborated with five other facilitators in preparing this Summer Institute: Emma Betz (University of Waterloo), Elwys De Stefani (University of Heidelberg & KU Leuven), Barbara A. Fox (University of Colorado), Chase Raymond (University of Colorado) and Jörg Zinken (Leibniz-Institute for the German Language, Mannheim). The goal of ISIIL was to bring together both early-career researchers and established scholars from the fields of Conversation Analysis (CA) and Interactional Linguistics (IL) in order to foster the development of new skills for doing research using IL. The participants and organizers had diverse backgrounds, both in terms of their research interests (e.g., classroom interaction, second language acquisition, cross-linguistic comparison, particles, grammar-in-interaction) and institutional affiliations, with many participants from institutions from around Europe (i.e., Belgium, Denmark, England, France, Germany, Norway, Sweden, Switzerland) as well as overseas (Canada, U.S.A., South Africa). Because of the compact nature of the Institute, the advanced topics covered, as well as the original research projects the participants would engage in, participation was limited to 24 participants, selected on the basis of their prior training and experience in CA/IL.
As part of a larger research paradigm on understanding client change in the helping professions from an interprofessional perspective, this paper applies a conversation analytic approach to investigate therapists’ requesting examples (REs) and their interactional and sequential contribution to clients’ change during the diagnostic evaluation process. The analyzed data comprises 15 videotaped intake interviews that followed the system of Operationalized Psychodynamic Diagnosis. Therapists’ requesting examples in psychodiagnostic interviews explicitly or implicitly criticize the patient’s prior turn as insufficient. They also open a retro-sequence and in the following turns provide for a description that helps clarify meaning and evince psychic or relational aspects of the topic at hand. While the therapist’s prior request initiates the patient’s insufficient presentation, the patient’s example presentation is regularly followed by the therapist’s summarizing comments or by further requests. Requesting examples thus are a particular case of requests that follow expandable responses regarding the sequential organization; yet, given that they make examples conditionally relevant, they are more specific. With the help of this sequential organization, participants co-construct common knowledge which allows the therapist to pursue the overall aim of therapy, which is to increase the patients’ awareness of their distorted perceptions, and thus to pave the way for change.
Making corpora accessible and usable for linguistic research is a huge challenge in view of (too) big data, legal issues and a rapidly evolving methodology. This does not only affect the design of user-friendly graphical interfaces to corpus analysis tools, but also the availability of programming interfaces supporting access to the functionality of these tools from various analysis and development environments. RKorAPClient is a new research tool in the form of an R package that interacts with the Web API of the corpus analysis platform KorAP, which provides access to large annotated corpora, including the German reference corpus DeReKo with 45 billion tokens. In addition to optionally authenticated KorAP API access, RKorAPClient provides further processing and visualization features to simplify common corpus analysis tasks. This paper introduces the basic functionality of RKorAPClient and exemplifies various analysis tasks based on DeReKo, that are bundled within the R package and can serve as a basic framework for advanced analysis and visualization approaches.
Words originating from shortening, including acronyms and clippings, constitute a treasure trove of insight into phonological grammar. In particular, they serve as an ideal testing ground for Optimality Theory (OT) and its view of grammar as an interaction of markedness constraints, which express (dis-) preferences regarding phonological structure in output forms, and faithfulness constraints, which require output forms to correspond to input structure (Prince and Smolensky 1993). This is because shortenings are characterised by a sharply diminished role of faithfulness, allowing for markedness constraints to make their force felt (“The Emergence of the Unmarked”). This article aims to demonstrate the heuristic value of shortening data for testing the OT model and for shedding light on various controversies in German phonology. A particular concern is to draw attention to the need for properly sorting the shortening data, to identify influences on phonological structure due to internal domain boundaries or to special correspondence effects potentially obscuring the view on the maximally unmarked patterns.
Mobile live video streaming with smartphones is an everyday media practice in which the participants are in a specific multimodal constellation and streamers and viewers have access to various semiotic resources for interactionally establishing alignment. Based on the multimodal sequence analysis of a concise episode of a journalist's livestream coverage of a political event on the streaming platform Periscope, I will address the question of how participation and involvement in live video streams are achieved and organised by the participants. I will show that hosts in the media practice of live video streaming act in an interaction-dominant manner and involve the viewers in the situation through asymmetrical participation coordination via footing shifts.
Recent typological studies have shown that socio-linguistic factors have a substantial effect on at least certain structures of language. However, we are still far from understanding how such factors should be operationalized and how they interact with other factors in shaping grammar. To address both questions, this study examines the influence of socio-linguistic factors on the number of dedicated conditional constructions in a sample of 374 languages. We test the number of speakers, the degree of multilingualism, the availability of a literature tradition, the use of writing, and the use of the language in the education system. At the same time, we control for genealogical, contact, and bibliographical biases. Our results suggest that the number of speakers is the most informative predictor. However, we find that the association between the number of speakers and the number of dedicated conditional constructions is much weaker than assumed, once genealogical and contact biases are controlled for.