Refine
Year of publication
Document Type
- Part of a Book (108)
- Article (82)
- Conference Proceeding (54)
- Book (1)
- Doctoral Thesis (1)
- Master's Thesis (1)
- Review (1)
Language
- English (248) (remove)
Keywords
- Deutsch (65)
- Korpus <Linguistik> (51)
- Konversationsanalyse (25)
- Interaktion (23)
- Computerlinguistik (22)
- Gesprochene Sprache (19)
- Natürliche Sprache (17)
- Sprachpolitik (15)
- Maschinelles Lernen (14)
- Digital Humanities (12)
Publicationstate
- Zweitveröffentlichung (248) (remove)
Reviewstate
Publisher
- de Gruyter (30)
- Benjamins (16)
- European Language Resources Association (16)
- Springer (13)
- Narr (9)
- Oxford University Press (8)
- Narr Francke Attempto (7)
- Cambridge University Press (5)
- Editura Academiei Române (5)
- Elsevier (5)
Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage; its poetic genres impose unique combinatory constraints on linguistic elements. How does the constrained poetic structure facilitate speech segmentation when common linguistic and statistical cues are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language.
The project Referenzkorpus Altdeutsch (‘Old German Reference Corpus’) aims to es- tablish a deeply-annotated text corpus of all extant Old German texts. As the automated part-of-speech and morphological pre-annotation is amended by hand, a quality control system for the results seems a desirable objective. To this end, standardized inflectional forms, generated using the morphological information, are compared with the attested word forms. Their creation is described by way of example for the Old High German part of the corpus. As is shown, in a few cases, some features of the attested word forms are also required in order to determine as exactly as possible the shape of the inflected lemma form to be created.
Meaning in interaction
(2024)
This editorial to the Special Issue on “Meaning in Interaction” introduces to the approach of Interactional Semantics, which has been developed over the last years within the framework of Interactional Linguistics. It discusses how “meaning” is understood and approached in this framework and lays out that Interactional Semantics is interested in how participants clarify and negotiate the meanings of the expressions that they are using in social interaction. Commonalities and differences of this approach with other approaches to meaning are flagged, and the intellectual origins and precursors of Interactional Semantics are introduced. The contributions to the Special Issue are located in the larger field of research.
The availability of electronic corpora of historical stages of languages has been wel- comed as possibly attenuating the inherent problem of diachronic linguistics, i.e. that we only have access to what has chanced to come down to us - the problem which was memorably named by Labov (1992) as one of “Bad Data”. However, such corpora can only give us access to an increased amount ot historical material and this can essentially still only be a partial and possibly distorted picture of the actual language at a particular period of history. Corpora can be improved by taking a more representative sample of extant texts if these are available (as they are in significant number for periods after the invention of printing). But, as examples from the recently compiled GerManC corpus of seventeenth and eighteenth century German show, the evidence from such corpora can still fail to yield definitive answers to our questions about earlier stages of a language. The data still require expert interpretation, and it is important to be realistic about what can legitimately be expected from an electronic historical corpus.
Multi-faceted alignment. Toward automatic detection of textual similarity in Gospel-derived texts
(2015)
Ancient Germanic Bible-derived texts stand in as test material for producing computational means for automatically determining where textual contamination and linguistic interference have influenced the translation process. This paper reports on the results of research efforts that produced a text corpus; a method for decomposing the texts involved into smaller, more directly comparable thematically-related chunks; a database of relationships between these chunks; and a user-interface allowing for searches based on various referential criteria. Finally, the state of the product at the end of the project is discussed, namely as it was handed over to another researcher who has extended it to automatically find semantic and syntactic similarities within comparable chunks.
In this paper we present some preliminary considerations concerning the possibility of automatic parsing an annotated corpus for N-N compounds. This should in prin- ciple be possible at least for relational and stereotype compounds, if the lemmatization of the corpus connects the lemmata with lexical entries as described in Höhle (1982). These lexical entries then supply the necessary information about the argument structure of a relational noun or about the stereotypical purpose associated with the noun’s referent which can be used to establish a relation between the first and the head constituent of the compound.
The relative order of dative and accusative objects in older German is less free than it is today. The reason for this could be that speakers of the direct predecessor of Old High German organized the referents according to the Thematic Hierarchy. If one applies a Case Hierarchy Nom>Acc>Dat to this, the order Nom - Dat - Acc falls out. It becomes apparent that the status of the Thematic Hierarchy is not a factor governing underlying word order, but a factor inducing scrambling. Arguments from binding theory, whose validity is discussed, indicate that the underlying order is ‘accusative before dative’
Recent typological studies have shown that socio-linguistic factors have a substantial effect on at least certain structures of language. However, we are still far from understanding how such factors should be operationalized and how they interact with other factors in shaping grammar. To address both questions, this study examines the influence of socio-linguistic factors on the number of dedicated conditional constructions in a sample of 374 languages. We test the number of speakers, the degree of multilingualism, the availability of a literature tradition, the use of writing, and the use of the language in the education system. At the same time, we control for genealogical, contact, and bibliographical biases. Our results suggest that the number of speakers is the most informative predictor. However, we find that the association between the number of speakers and the number of dedicated conditional constructions is much weaker than assumed, once genealogical and contact biases are controlled for.
A constructicon, i.e., a structured inventory of constructions, essentially aims at documenting functions of lexical and grammatical constructions. Among other parameters, so-called constructional collo-profiles, as introduced by Herbst (2018, 2020), are conclusive for determining constructional meanings. They provide information on how relevant individual words are for construction slots, they hint at usage preferences of constructions and serve as a helpful indicator for semantic peculiarities of constructions. However, even though collo-profiles constitute an indispensable component of constructicon entries, they pose major challengers for constructicographers: For a constructicographic enterprise it is not feasible to conduct collostructional analyses for hundreds or even thousands of constructions. In this article, we introduce a procedure based on the large language model BERT that allows to predict collo-profiles without having to extensively annotate instances of constructions in a given corpus. Specifically, by discussing the constructions X macht Y ADJP (‘x makes Y ADJ’, e.g. he drives him crazy) and N1 PREP N1 (e.g., bumper to bumper, constructions over constructions), we show how the developed automated system generates collo-profiles based on a limited number of annotated instances. Finally, we place collo-profiles alongside other dimensions of constructional meanings included in the German Constructicon.
Latvia
(2019)
This chapter deals with current issues in bilingual education in the framework of language and educational policies in Latvia, and also outlines similarities or common tendencies in the two other Baltic states, Estonia and Lithuania. As commonly understood in the 21st century, the term ‘bilingual education’ includes ‘multilingual education, as the umbrella term to cover a wide spectrum of practice and policy’ (García, 2009: 9).
The idea of this article is to take the immaterial and somehow ethereal nature of aesthetic concepts seriously by asking how aesthetic concepts are negotiated and thus formed in communication. My examples come from theatrical production where aesthetic decisions naturally play a major role. In the given case, an aesthetic concept is introduced with which only the director, but none of the actors is familiar in the beginning of the rehearsals. The concept, Wabi Sabi, comes from Japanese culture. As the whole rehearsal process was video recorded, it is possible to track the process of how the concept is negotiated and acquired over time. So, instead of defining criteria what Wabi Sabi as an aesthetic concept “consists of,” this article seeks to show how the concept is introduced, explained and “used” within a practical context, in this case a theater rehearsal. In contrast to conventional models of aesthetic experience, I am interested in the ways in which an aesthetic concept is configured in and through socially organized interaction, and — vice versa — how that interaction contributes to the situational accomplishment of the same concept. In short: I am interested in the “doing” of aesthetic concepts, especially in “doing Wabi Sabi.”
In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.
The ubiquity of smartphones has been recognised within conversation analysis as having an impact on conversational structures and on the participants’ interactional involvement. However, most of the previous studies have relied exclusively on video recordings of overall encounters and have not systematically considered what is taking place on the device. Due to the personal nature of smartphones and their small displays, onscreen activities are of limited visibility and are thus potentially opaque for both the co-present participants (“participant opacity”) and the researchers (“analytical opacity”). While opacity can be an inherent feature of smartphones in general, analytical opacity might not be desirable for research purposes. This chapter discusses how a recording set-up consisting of static cameras, wearable cameras and dynamic screen captures allowed us to address the analytical opacity of mobile devices. Excerpts from multi-source video data of everyday encounters will illustrate how the combination of multiple perspectives can increase the visibility of interactional phenomena, reveal new analytical objects and improve analytical granularity. More specifically, these examples will emphasise the analytical advantages and challenges of a combined recording set-up with regard to smartphone use as multiactivity, the role of the affordances of the mobile device, and the prototypicality and “naturalness” of the recorded practices.
Our paper discusses family language policies among multilingual families in Latvia with Russian as home language. The presentation is based on three case studies, i.e. interviews conducted with Russophones who have chosen to send their children to Latvian-medium pre-schools and schools. The main aim is to understand practices and regards among such families “from below,” i.e. which family-internal and family-external factors influenced the choice of Latvian-medium education and what impact this choice has on linguistic practices.
The paper shows that there have been critical events which both encouraged and discouraged the choice of Latvian-medium education. The wish to integrate into mainstream society has been met by obstacles both from ethnic Russians and Latvians. Yet, the three families consider their choices to be the right ones for the future development of their children in a multiethnic Latvia in which Latvian serves as the unifying language of society.
Aims and objectives:
Language debates in Latvia often focus on the role of Latvian as official and main societal language. Yet, Latvian society is highly multilingual, and families with home languages other than Latvian have to choose between different educational trajectories for their children. In this context, this paper discusses the results of two studies which addressed the question of why families with Russian as a home language choose (pre)schools with languages other than Russian as medium of instruction (MOI). The first study analyses family narratives which provide insight into attitudes and practices which lead to the decision to send children to Latvian-MOI institutions. The second study investigates language attitudes and practices by families in the international community of Riga German School.
Methodology:
The paper discusses data gathered during two studies: for the first, semi-structed interviews were conducted with Russian-speaking families who choose Latvian-medium schools for their children. For the second study, a survey was carried out in the community of an international school in Riga, sided by ethnographic observations and interviews with teachers and the school leadership.
Data and analysis:
Interviews and ethnographic observations were subjected to a discourse analysis with a focus on critical events and structures of life trajectory narratives. Survey data were processed following simple statistical analysis and qualitative content analysis.
Findings/conclusions:
Our data reveal that families highly embrace multilingualism and see the development of individual plurilingualism as important for integration into Latvian society as well as for educational and professional opportunities in the multilingual societies of Latvia and Europe. At the same time, multilingualism and multiculturalism, including Russian, are seen as a value in itself. In addition, our studies reflect the bidirectionality of family language policies in interplay with practices in educational institutions: family decisions influence children’s language acquisition at school, but the school also has an impact on the families’ language practices at home. In sum, we argue that educational policies should therefore pay justice to the wishes of families in Latvia to incorporate different language aspects into individual educational trajectories.
Originality:
Language policy is a frequent topic of investigation in the Baltic states. However, there has been a lack in research on family language policy and school choices. In this vein, our paper adds to the understanding of educational choices and language policy processes among Russian-speaking families and the international community in Latvia.
Introduction
(2023)
We argue that properties with a nominal origin get transferred regularly in certain Gentian particle verb constructions to properties that are propositional insofar as they characterize the temporal structure of eventualities, understood to be described by propositional (= truth-assessable) representations of state changes. Accordingly, the oft-noted perfectivizing function of certain verbal particles like ein- in einfahren ('pull in', cf. Kühnhold 1972) is the effect of redressing a conflict at the syntax-semantics interface: On the one hand, constructions like in [die Grube]acc einfahren ('pull into the mine’) exhibit transitive syntax (Gehrke 2008), requiring that the syntactic arguments be mapped onto well-distinguished or DIFFERENT referents in the semantics (Kemmer 1993). On the other hand, in/ein codes a spatio-temporal inclusion relation between its relata, contradicting the requirement imposed by the transitive syntax. Following Brandt (2019), we submit that the interface executes a manoeuvre that delays the interpretation of part of the contradiction-inducing DIFFERENCE feature. It is not locally interpreted (semantically represented) in toto but in part passed on to the next syntactic-semantic computational cycle. Here, the passed-on meaning is interpreted in the locally customary terms, in the case at hand, as a temporal index where the post-state of the depicted eventuality does not hold.
The internationally renowned conference of the European Association for Lexicography (EURALEX) has taken place every two years for the past 39 years. Last year’s conference, held July 12th–16th, 2022, marked EURALEX’s 20th edition, and more than 200 international participants gathered at Mannheim Palace to discuss current developments, learn about new projects, and present their own work — either in lexicography or in one of the many applied or neighboring disciplines such as corpus and computational linguistics.
The present paper examines the rise and fall of Modern High German loanwords in English from 1600 until 2000, principally making use of the record of borrowing documented by the Oxford English Dictionary (OED) in its Third Edition (online version, in revision 2000-). Groups of loanwords are analysed by century, with reference to the changing social and cultural landscape characterising relationships between the relevant nations over this period. This is not a simple picture: each language grows over the period in different ways, and the speakers of English look to German at different times for different types of borrowing, as the political and intellectual balance alters.
Morphophonological asymmetries in affixation concern systematic correlations between morphological properties of affixes (e.g. combination with bound versus free stems, position relative to stem (suffixes versus prefixes)) and their phonological properties (e.g. stress behaviour). The arguably most insightful approach to capturing relevant asymmetries invokes a notion of affix coherence, first introduced by Dixon in connection with his work on Yidiɲ, a nearly extinct language spoken in Northern Australia. This notion is based on a categorical division of affixes into ones that integrate into the phonological word of the stem and ones that do not. The integration of affixes is envisioned as being fully determined by phonological and morphological structure in a given language and verifiable by diagnostics relevant to phonological word domains (primarily the syllable and the foot structure). The assumption of two types of prosodic domains characterized by integrated versus non-integrated affixes is manifest in consistent asymmetries that pertain to morphophonological, phonological, and phonetic rules. This consistency constitutes compelling evidence for the structure-based analysis of the impact of various affixes on derived words, as opposed to alternative approaches to capturing these effects by associating affixes with diacritics (morpheme versus word boundary, class 1 versus class 2, stratum 1 versus stratum 2). The present entry aims to demonstrate, mostly on the basis of data from Germanic languages, the breadth of the empirical evidence in support of a fundamental role of affix coherence. Moreover, it aims to draw attention to the various implications of affix coherence for modeling relevant generalizations, in particular the necessary reference to a level of phonological representation characterized by a specific degree of abstractness (‘phonemic’).
This paper discusses contemporary societal roles of German in the Baltic states (Latvia, Estonia, Lithuania). Speaker and learner statistics and a summary of sociolinguistic research (Linguistic Landscapes, language learning motivation, language policies, international roles of languages) suggest that German has by far fewer speakers and functions than the national languages, English, and Russian, and it is not a dominant language in the contemporary Baltics anymore. However, German is ahead of ‘any other language’ in terms of users and societal roles as a frequent language in education, of economic relations, as a historical lingua franca, and a language of traditional and new minorities. Highly diverse groups of users and language policy actors form a ‘coalition of interested parties’ which creates niches which guarantee German a frequent use. In the light of the abundance of its functions, the paper suggests the concept ‘additional language of society’ for a variety such as German in the Baltics – since there seems to be no adequate alternative labelling which would do justice to all societal roles. The paper argues that this concept may also be used for languages in similar societal situations and, not least, be useful in language marketing and the promotion of multilingualism.
This study explores the interdependence of qualitative and quantitative analysis in articulating empirically plausible and theoretically coherent generalizations about grammatical structure. I will show that the use of large electronic corpora is indispensable to the grammarian's work, serving as a rich source of semantic and contextual information, which turns out to be crucial in categorizing and explaining grammatical forms. These general concerns are illustrated by the patterns of use of Czech relative clauses (RC) with the non-declinable relativizer co, by taking a set of existing claims about these RCs and testing their accuracy on corpus material. The relevant analytic categories revolve around the referential type of the relativized noun, the interaction between relativization and deixis, and the semantic relationship between the relativized noun and the proposition expressed by the RC. The analysis demonstrates that some of the existing claims are fully invalid in the face of regularly attested semantic distinctions, while others are more or less on the right track but often not comprehensive or precise enough to capture the full richness of the facts. 1
Conversation is usually considered to be grammatically simple, while academic writing is often claimed to be structurally complex, associated primarily with a greater use of dependent clauses. Our goal in the present paper is to challenge these stereotypes, based on the results of large-scale corpus investigations. We argue that both conversation and professional academic writing are grammatically complex but that their complexities are dramatically different. Surprisingly, the traditional view that complexity is realized through extensive clausal embedding leads to the conclusion that conversation is more complex than academic writing. In contrast, written academic discourse is actually much more ‘compressed’ than elaborated, and the complexities of academic writing are realized mostly as phrasal embedding rather than embedded clauses.
In this chapter, we will investigate smartphone-based showing sequences in everyday social encounters, that is, moments in which a personal mobile device is used for presenting (audio-)visual content to co-present participants. Despite a growing interest in object-centred sequences and mundane technology use, detailed accounts of the sequential, multimodal, and material dimensions of showing sequences are lacking. Based on video data of social interactions in different languages and on the framework of multimodal interaction analysis, this chapter will explore the link between mobile device use and social practices. We will analyse how smartphone showers and their recipients coordinate the manipulation of a technological object with multiple courses of action, and reflect upon the fundamental complexity of this by-now routine joint activity.
This paper first argues that the distinction between Propositions and States-of-Affairs is significant for understanding a number of linguistic contrasts, including contrasts between nominalizations, complement clauses, readings of modal infinitives, raising constructions, illocutions and moods, relative clauses, and nouns. Subsequently, the paper outlines a cognitive linguistic model of the distinction, according to which Propositions and States-of-Affairs differ in terms of construal. Both prompt Langackerian “processes”, but only Propositions prompt a construal of these processes as referential. The paper argues that this model has a number of advantages over a traditional, denotational understanding of the distinction.
The present article proposes a syntactic and semantic analysis of assertive clauses that comprises their truth-conditional aspects and their speech act potential in communication. What is commonly called “illocutionary force” is differentiated into three structurally and functionally distinct layers: a judgement phrase, representing subjective epistemic and evidential attitudes; a commitment phrase, representing the social commitment related to assertions; and an act phrase, representing the relation to the common ground of the conversation. The article provides several pieces of evidence for this structure: from the interpretation and syntactic position of various classes of epistemic, evidential, affirmative and speech act-related operators, from clausal complements embedded by different types of predicates, from embedded root clauses, and from anaphora referring to different clausal projections. The syntactic assumptions are phrased within X-bar theory, and the semantic interpretation makes use of dynamic update of common ground, differentiating between informative and performative updates. The object language is German, with particular reference to verb final and verb second structure.
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
This paper presents the IVK-Ler corpus, a longitudinal, annotated learner corpus of weekly writings produced by a group of 18 adolescents in a preparatory class. The corpus consists of 117 student texts collected between 2020 and 2021 and has a structure layered by student and text number. It includes metadata that enables researchers to analyze and track individual student progress in terms of syntactic competence and literacy. The annotation schema, manual and automatic annotation processes, and corpus representation are described in detail. The corpus currently includes target hypotheses and gold standard part-of-speech tags. Future work could include additional annotation layers for topological fields and dependency relations, as well as semantic and discourse annotations to make the corpus usable for tasks beyond syntactic evaluations.
In the context of a Nordic Conference on Bilingualism, it can be a rewarding task to look at issues such as language planning, policy and legislation from a perspective of the southern neighbours of the Nordic world. This paper therefore intends to point attention towards a case of societal multilingualism at the periphery of the Nordic world by dealing with recent developments in language policy and legislation with regard to the North Frisian speech community in the German Land of Schleswig-Holstein. As I will show, it is striking to what degree there are considerable differences in the discourse on minority protection and language legislation between the Nordic countries and a cultural area which may arguably be considered to be part of the Nordic fringe - and which itself occasionally takes Scandinavia as a reference point, e.g. in the recent adoption of a pan-Frisian flag modelled on the Nordic cross (Falkena 2006).
The main focus of the paper will be on the Frisian Act which was passed in the Parliament of Schleswig-Holstein in late 2004. It provides a certain legal basis for some political activities with regard to Frisian, but falls short of creating a true spirit of minority language protection and/or revitalisation. In contrast to the traditions of the German and Danish minorities along the German-Danish border and to minority protection in Northern Scandinavia (in particular to Sámi language rights), the approach chosen in the Frisian Act is extremely weak and has no connotation of long-term oriented language-planning, let alone a rights-based perspective.
The paper will then look at policy developments in the time since the Act was passed, e.g. in the Schleswig-Holstein election campaign in 2005, and on latest perceptions of the Frisian language situation in the discourse on North Frisian Policy in Schleswig-Holstein majority society. In the final part of the paper, I will discuss reasons for the differences in minority language policy discourse between Germany and the Nordic countries, and try to provide an outlook on how Frisian could benefit from its geographic proximity to the Nordic world.
This replication study aims to investigate a potential bias toward addition in the German language, building upon previous findings of Winter and colleagues who identified a similar bias in English. Our results confirm a bias in word frequencies and binomial expressions, aligning with these previous findings. However, the analysis of distributional semantics based on word vectors did not yield consistent results for German. Furthermore, our study emphasizes the crucial role of selecting appropriate translational equivalents, highlighting the significance of considering language-specific factors when testing for such biases for languages other than English.
This chapter explores the Linguistic Landscape of six medium-size towns in the Baltic States with regard to languages of tourism and to the role of English and Russian as linguae francae. A quantitative analysis of signs and of tourism web sites shows that, next to the state languages, English is the most dominant language. Yet, interviews reveal that underneath the surface, Russian still stands strong. Therefore, possible claims that English might take over the role of the main lingua franca in the Baltic States cannot be maintained. English has a strong position for attracting international tourists, but only alongside Russian which remains important both as a language of international communication and for local needs.
Polish żeby under negation
(2021)
The paper addresses two patterns in the distribution of complement clauses headed by the complementizer żeby in Polish related to the presence of sentential negation. It is argued that żeby-clauses with an obligatory negation in the matrix clause, licensed by epistemic verbs, can be treated in terms of negative polarity, with żeby defined as an n-word. Structures with żeby-clauses and an obligatory negation in the embedded clause, licensed by verbs of fear, are argued to be an instance of negative complementation, with żeby specified as a negative complementizer. A uniform lexicalist analysis within the framework of HPSG is provided, employing tools developed to account for Negative Concord in Polish.
This paper deals with a specific type of lexeme, namely binary preposition-noun combinations containing temporal references like am Ende [at (the) end] or für Sekunden [for seconds]. The main characteristic of these combinations is the recurrent internal zero gap. Despite the fact that the omission of the determiner can often be explained by grammatical rules, the zero gaps indicate a higher degree of lexicalization. Therefore, we interpret these expressions as minimal phraseological units with holistic meanings and functions. The corpusdriven exploration of typical context patterns (e.g. using collocation profiles and the lexpan slot filler analysis) shows that a) even such minimal expressions are based on semi-abstract schemes and b) temporal expressions can also fulfill modal or discursive functions, usually with fuzzy borders and overlapping structures. In the case of modalization or pragmatization one can regard such PNs as distinct lexicon entries.
Words originating from shortening, including acronyms and clippings, constitute a treasure trove of insight into phonological grammar. In particular, they serve as an ideal testing ground for Optimality Theory (OT) and its view of grammar as an interaction of markedness constraints, which express (dis-) preferences regarding phonological structure in output forms, and faithfulness constraints, which require output forms to correspond to input structure (Prince and Smolensky 1993). This is because shortenings are characterised by a sharply diminished role of faithfulness, allowing for markedness constraints to make their force felt (“The Emergence of the Unmarked”). This article aims to demonstrate the heuristic value of shortening data for testing the OT model and for shedding light on various controversies in German phonology. A particular concern is to draw attention to the need for properly sorting the shortening data, to identify influences on phonological structure due to internal domain boundaries or to special correspondence effects potentially obscuring the view on the maximally unmarked patterns.
This article details the process of creating the Nottinghamer Korpus deutscher YouTube-Sprache ('The Nottingham German YouTube Language Corpus' - or NottDeuYTSch corpus) and outlines potential research opportunities. The corpus was compiled to analyse the online language produced by young German-speakers and offers significant opportunity for in-depth research across several linguistic fields including lexis, morphology, syntax, orthography, and conversational and discursive analysis. The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represent an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses. The NottDeuYTSch corpus is available for analysis as part of the German Reference Corpus (DeReKo).
Basic grammatical categories may carry social meanings irrespective of their semantic content. In a set of four studies, we demonstrate that verbs—a basic linguistic category present and distinguishable in most languages—are related to the perception of agency, a fundamental dimension of social perception. In an archival analysis of actual language use in Polish and German, we found that targets stereotypically associated with high agency (men and young people) are presented in the immediate neighborhood of a verb more often than non-agentic social targets (women and older people). Moreover, in three experiments using a pseudo-word paradigm, verbs (but not adjectives and nouns) were consistently associated with agency (but not with communion). These results provide consistent evidence that verbs, as grammatical vehicles of action, are linguistic markers of agency. In demonstrating meta-semantic effects of language, these studies corroborate the view of language as a social tool and an integral part of social perception.
Nonnative accents are prevalent in our globalized world and constitute highly salient cues in social perception. Whereas previous literature has commonly assumed that they cue specific social group stereotypes, we propose that nonnative accents generally trigger spontaneous negatively biased associations (due to a general nonnative accent category and perceptual influences). Accordingly, Study 1 demonstrates negative biases with conceptual IATs, targeting the general concepts of accent versus native speech, on the dimensions affect, trust, and competence, but not on sociability. Study 2 attests to negative, largely enhanced biases on all dimensions with auditory IATs comprising matched native–nonnative speaker pairs for four accent types. Biases emerged irrespective of the accent types that differed in attractiveness, recognizability of origin, and origin-linked national associations. Study 3 replicates general IAT biases with an affect IAT and a conventional evaluative IAT. These findings corroborate our hypotheses and assist in understanding general negativity toward nonnative accents.
Communication of stereotypes in the classroom: biased language use of German and Turkish adolescents
(2014)
Little is known about the linguistic transmission and maintenance of mutual stereotypes in interethnic contexts. This field study, therefore, investigated the linguistic expectancy bias (LEB) and the linguistic intergroup bias (LIB) among German and Turkish adolescents (13 to 20 years) in the school context. The LEB refers to the general phenomenon of describing stereotypes more abstractly. The LIB is the tendency to use language abstraction for in-group protective reasons. Results revealed an unmoderated LEB, whereas the LIB only occurred when foreigners were in the numerical majority, the classroom composition was perceived as a learning disadvantage, or the interethnic conflict frequency was high. These findings provide first evidence for the use of both LEB and LIB in an interethnic classroom setting.
As immigration and mobility increases, so do interactions between people from different linguistic backgrounds. Yet while linguistic diversity offers many benefits, it also comes with a number of challenges. In seven empirical articles and one commentary, this Special Issue addresses some of the most significant language challenges facing researchers in the 21st century: the power language has to form and perpetuate stereotypes, the contribution language makes to intersectional identities, and the role of language in shaping intergroup relations. By presenting work that aims to shed light on some of these issues, the goal of this Special Issue is to (a) highlight language as integral to social processes and (b) inspire researchers to address the challenges we face. To keep pace with the world’s constantly evolving linguistic landscape, it is essential that we make progress toward harnessing language’s power in ways that benefit 21st century globalized societies.
This chapter will present results of a linguistic landscape (LL) project in the regional centre of Rēzekne in the region of Latgale in Eastern Latvia. Latvia was de facto a part of the Soviet Union until 1991, and this has given it a highly multilingual society. In the essentially post-colonial situation since 1991, strict language policies have been in place, which aim to reverse the language shift from Russian, the dominant language of Soviet times, back to Latvian. Thus, the main interests of the research were how the complex pattern of multilingualism in Latvia is reflected in the LL; how people relate to current language legislation; and what motivations, attitudes and emotions inform their behaviour.
Sexual harassment severely impacts the educational system in the West African country Benin and the progress of women in this society that is characterized by great gender inequality. Knowledge of the belief systems rooting in the sociocultural context is crucial to the understanding of sexual harassment. However, no study has yet investigated how sexual harassment is related to fundamental beliefs in Benin or West African countries. We conducted a field study on 265 female and male students from several high schools in Benin to investigate the link between sexual harassment and measures of ambivalent sexism, gender identity, and rape myth acceptance. Almost half of the sample reported having experienced sexual harassment personally or among peers. Levels of sexism and rape myth acceptance were very high compared to other studies. These attitudes appeared to converge in a sexist belief system that was linked to personal experiences, the perceived probability of experiencing and fear of sexual harassment. Results suggest that sexual harassment is a societal problem and that interventions need to address fundamental attitudes held in societies low in gender equality.
Nonnative-accented speakers face prevalent discrimination. The assumption that people freely express negative sentiments toward nonnative speakers has also guided common research methods. However, recent studies did not consistently find downgrading, so that prejudice against nonnative accents might even be questioned at first sight. The present theoretical article will bridge these contradictory findings in three ways: (a) We illustrate that nonnative speakers with foreign accents frequently may not be downgraded in commonly used first-impression and employment scenario paradigms. It appears that relatively controlled responding may be influenced by norms and motivations to respond without prejudice, whereas negative biases emerge in spontaneous responding. (b) We present an integrative view based on knowledge on modern forms of prejudice to develop modern notions of accent-ism, which allow for predictions when accent biases are (not) likely to surface. (c) We conclude with implications for interventions and a tailored research agenda.
The present research unites two emergent trends in the area of language attitudes: (a) research on perceptions of nonnative speakers by nonnative listeners and (b) the search for general, basic mechanisms underlying the evaluation of nonnative accented speakers. In three experiments featuring an employment situation, German participants listened to a presentation given in English by a German speaker with a strong versus native-like accent (in Studies 1–3) versus a native speaker of English (in Study 1). They evaluated candidates with a strong accent worse than candidates with a native(-like) pronunciation—even to the degree that the quality of arguments was of no relevance (Study 1). Study 2 introduces an effective intervention to reduce these discriminatory tendencies. Across studies, affect and competence emerged as major mediators of hirability evaluations. Study 3 further revealed sequential indirect influences, which advance our understanding of previous inconsistent findings regarding disfluency and warmth perceptions.
The establishment of Scottish Parliament: What difference does it make for the Gaelic language?
(2004)
After the Labour government takeover in Westminster in 1997, followed by the referendum on establishing a Scottish Parliament, hopes for more support for the Gaelic language in Scotland were nourished. In the election campaign to the Scottish Parliament in 1999, all parties which were elected to Parliament had mentioned Gaelic, and all parties except the Conservatives had promised an increase in support for Gaelic (cf. Scottish parties’ election manifestoes, obtainable from the parties or via their web sites). Now that the new Scottish Executive, formed by Labour and the Liberal Democrats, has been in power for some time, it is interesting to see if these hopes have been fulfilled.
The two core questions of this paper will thus be:
1. What is the status of Scottish Gaelic after the devolution process?
2. What difference does the existence of the Scottish Parliament make for the status of Gaelic?
It is important to note that this paper refers to language status and Gaelic’s position from a mere language policy perspective. The results are mostly based on an analysis of Parliament documents, the method of investigation being strictly philological. Empirical research has not yet been undertaken. The reference time of my paper will be the first year of Scottish Parliament and the new executive. Even though this is an arbitrary time break, the first year is a symbolic point of time. As the first legislation period as a possibly more natural reference point is not over yet, this choice seems legitimate.
The first International Summer Institute for Interactional Linguistics (henceforth ISIIL) took place from July 18 to 23 at the Leibniz-Institute for the German Language (IDS) in Mannheim, Germany. The local organizers, Arnulf Deppermann and Alexandra Gubina, collaborated with five other facilitators in preparing this Summer Institute: Emma Betz (University of Waterloo), Elwys De Stefani (University of Heidelberg & KU Leuven), Barbara A. Fox (University of Colorado), Chase Raymond (University of Colorado) and Jörg Zinken (Leibniz-Institute for the German Language, Mannheim). The goal of ISIIL was to bring together both early-career researchers and established scholars from the fields of Conversation Analysis (CA) and Interactional Linguistics (IL) in order to foster the development of new skills for doing research using IL. The participants and organizers had diverse backgrounds, both in terms of their research interests (e.g., classroom interaction, second language acquisition, cross-linguistic comparison, particles, grammar-in-interaction) and institutional affiliations, with many participants from institutions from around Europe (i.e., Belgium, Denmark, England, France, Germany, Norway, Sweden, Switzerland) as well as overseas (Canada, U.S.A., South Africa). Because of the compact nature of the Institute, the advanced topics covered, as well as the original research projects the participants would engage in, participation was limited to 24 participants, selected on the basis of their prior training and experience in CA/IL.
This paper has two distinct but interdependent goals. The empirical and analytical primary goal is to present a detailed overview of the patterns of (syntactico-semantic) argument structure and (morpho-syntactic) argument realization found with clause-embedding predicates in German. In particular, it will elucidate the observable relationships and dependencies between them, with a special focus on prepositional object clauses. The methodological secondary goal is to demonstrate the recently published ZAS Database of Clause-Embedding Predicates and illustrate its usefulness in approaching a concrete research agenda. The goals are aligned with each other because the data on patterns of argument structure and realization were collected using the database, and indeed the relevant questions could not have been investigated in such a thorough and efficient way without it. We will begin in Part 1 with an introduction to the database, its structure, and why and how it was created, before moving in Part 2 to the presentation of the data and analysis of argument structure and argument realization.
This chapter analyses the impact of political decentralization in a state on the position of ethnic and linguistic minorities, in particular with regard to the role of parliamentary assemblies in the political system. It relates a number of typical functions of parliaments to the specific needs of minorities and their languages. The most important of these functions are the representation of the minority and responsiveness to the minority’s needs. The chapter then discusses six examples from the European Union (and Norway) which prototypically represent different types of parliamentary decentralization: the ethnically defined Sameting in Norway and its importance for the Sámi population, the Scottish Parliament and its role for speakers of Scottish Gaelic, the German regional parliaments of the Länder of Schleswig-Holstein and Saxony and their impact on the Frisian and Sorbian minorities respectively, the autonomy of predominantly German-speaking South Tyrol within the Italian state, and finally the situation of the speakers of Latgalian in Latvia, where a decentralized parliament is missing. The chapter also makes suggestions on comparisons of these situations with minorities in Russia. It finally argues that political decentralization may indeed empower minorities to gain a greater voice in their states, even if much ultimately depends on individual factors in each situation and the attitudes by the majority population and the political center.
Repeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability. To address this, we investigate the extent to which repetition and predictability influence the articulation of the frequent German word “sie” [zi] (they). We find that articulatory variability is proportional to speaking rate and the duration of [zi], and that overall variability decreases as [zi] is repeated during the experiment. Lower variability is also observed as the conditional probability of [zi] increases, and the greatest reduction in variability occurs during the execution of the vocalic target of [i]. These results indicate that practice can produce observable differences in the articulation of even the most common gestures used in speech.
This article makes an empirical and a methodological contribution to the comparative study of action. The empirical contribution is a comparative study of three distinct types of action regularly accomplished with the turn format du meinst x (“you mean/think x”) in German: candidate understandings, formulations of the other’s mind, and requests for a judgment. These empirical materials are the basis for a methodological exploration of different levels of researcher abstraction in the comparative study of action. Two levels are examined: the (coarser) level of conditionally relevant responses (what a response speaker must do to align with the action of the prior turn) and the (finer) level of “full alignment” (what a response speaker can do to align with the action of a prior turn). Both levels of abstraction provide empirically viable and analytically interesting descriptive concepts for the comparative study of action. Data are in German.
This paper seeks to apply the principles of the famous 3-Circle-Model devised for the description of the ecolinguistic position of English world-wide to the position of German around the world.
On the one hand, the 3-Circle-Model for English with its "Inner", "Outer" and "Extended/Expanding" Circles was invented by Kachru in the 1980s and has since then been adopted, refined and criticised by numerous authors. The situation of German world-wide, on the other hand, has only been scarcely discussed in the past 20 years. While the global extension of German is obviously by far weaker than that of English, there are also a number of noteworthy similarities in terms of historical spread and the current position of these two languages.
This paper therefore discusses the analogies of global English and German by establishing three circles for German: the Inner Circle for the core German-speaking area, i.e. Germany, Austria and Switzerland; the Outer Circle including a number of German minority areas (mostly in Europe), and finally the Extended Circle which may be denoted as "Crumbling" rather than "Expanding". The latter comprises traditional German diaspora communities in different parts of the world which either result from migration, but also reflect the previous functions of German as a language of culture and as a lingua franca in regions like Eastern Europe. The paper argues that there are some striking structural similarities, but also shows the limits of this comparison.
This chapter investigates policies which shape the role of the German language in contemporary Estonia. Whereas German played for many centuries an important role as the language of the economic and cultural elite in Estonia, it severely declined in importance throughout the twentieth century. Mirrored on this historical background, the paper provides an overview of the current functions of German and attitudes towards it and it discusses how these functions and attitudes are influenced by policies of various actors from inside and outside Estonia. The paper argues that German continues to play a significant role: while German is no longer a lingua franca, it still enjoys a number of functions and prestige in clearly defined niches involving communication within German-speaking circles or between Estonians and Germans. The interplay of language policies of the Estonian and the German-speaking states as well as by semi-state and private institutions succeed in maintaining German as an additional language in contemporary Estonia.
This chapter introduces readers to the context and concept of this volume. It starts by providing an historical overview of languages and multilingualism in Lithuania, Estonia and Latvia, highlighting the 100th anniversary of statehood which the three Baltic states are celebrating in 2018. Then, the chapter briefly presents important strands of research on multilingualism in the region throughout the past decades; in particular, questions about language policies and the status of the national languages (Estonian, Latvian and Lithuanian) and Russian. It also touches on debates about languages in education and the roles of other languages such as the regional languages of Latgalian and Võro and the changing roles of international languages such as English and German. The chapter concludes by providing short summaries of the contributions to this book.
Studies on the Linguistic Landscapes (LLs) investigate frequencies, functions, and power relations between languages and their speakers in public space. Research on the LL thereby aims to understand how the production and perception of signs reflect and simultaneously shape realities. In this sense, the LL is one of the most dynamic places where processes of minoritization take place: the (in)visibility of minority languages and the functional and symbolic relationships to majority languages are in direct relationship with negotiations of minorities’ place in society. This chapter looks at minority languages in the LL from two major perspectives. Firstly, it discusses language policies, focussing on which policy categories and which domains of language use are of particular relevance for understanding minority languages in the LL. Then, it turns to issues of conflict, contestation, and exclusion by providing examples from a range of geographically and typologically prototypical case studies, including Israel, Canada, Belgium, the Basque Country, and Friesland.
This paper discusses how the regional language of Latgalian in Latvia has benefitted from societal discourse on the antagonism between speakers of Latvian and Russian in Latvia. Triggered by the 2012 referendum on Russian as a possible second state language of Latvia, Latvian politics (exemplified by politicians' statements since 2012 as well as by 2014 election manifestoes) as well as society at large (displayed by e.g. increased attention in the educational sector and the media) have started to devote considerably more attention to the region of Latgale, including its cultural and linguistic heritage. The paper thereby argues that speakers of Latgalian have gained a noteworthy increase in voice, even though the future of the variety is still considered to be uncertain.
Research on language politics, policy, and planning is of importance to contact linguistics, since political relations between groups of language users, the way in which the use of language(s) is organized, and how language issues are politicized fundamentally shape the political and social conditions under which language varieties are in contact. This chapter first provides a short sketch of how language policy, planning, and politics have so far been conceptualized. Major subfields will be discussed, and then relevant actors and factors in these processes will be introduced. At the end, these aspects will be discussed from a contact linguistic perspective and summarized in a graphic visualization.
Numerous academics and politicians have in recent years contributed to the description and analysis of language policy for the benefit of smaller languages. The present paper tries to add to these by taking up the notion of yet another aspect of politics and language, exemplified by two case studies. The political aspect is the decentralization of parliamentary power for the benefit of minority languages. The two case studies deal with the relationship between the Scottish Parliament and the Gaelic language on the one hand, and between the Norwegian Sarni Parliament, the Sameting, and the Sami language on the other hand. The underlying idea is to consider whether parliamentary bodies may contribute to the empowerment of speakers of minority languages regarding the language of individual choice in as many instances as possible. This applies to any domain of language use, but in particular public bodies, education, and the media, at local, regional and national levels.
This paper deals with different types of verbal complementation of the German verb verdienen. It focuses on constructions that have been undergoing a grammaticalization process and thus express deontic modality, as in Sie verdient geliebt zu werden (ʽShe deserves to be lovedʼ) and Sie verdient zu leben (ʽShe deserves to liveʼ) (Diewald, Dekalo & Czicza 2021). These constructions are connected to parallel complementation types with passive and active infinitives containing a correlate es, as in Sie verdient es, geliebt zu werden and Sie verdient es, zu leben, as well as finite clauses with the subordinator dass with and without correlative es, as in Sie verdient, dass sie geliebt wird and Sie verdient es, dass sie geliebt wird. This paper attempts to show a close comparative investigation of these six types of constructions based on their relevant semantic and syntactic properties in terms of clause linkage (Lehmann 1988). We analyze the relevant data retrieved from the DWDS corpus of the 20th century and present an expanded grammaticalization path for verdienen-constructions. The finite complementation with dass is regarded as an example of a separate structural option called “elaboration”. Concerning the use of correlative es, it is shown that it does not have any substantial effect on the grammaticalization of modal verdienen-constructions.
Lexical data API
(2022)
This API provides data from various dictionary resources of K Dictionaries across 50 languages. It is used by language service providers, app developers, and researchers, and returns data as JSON documents. A basic search result consists of an object containing partial lexical information on entries that match the search criteria, but further in-depth information is also available. Basic search parameters include the source resource, source language, and text (lemma), and the entries are returned as objects within the results array. It is possible to look for words with specific syntactic criteria, specifying the part of speech, grammatical number, gender and subcategorization, monosemous or polysemous entries. When searching by parameters, each entry result contains a unique entry ID, and each sense has its own unique sense ID. Using these IDs, it is possible to obtain more data – such as syntactic and semantic information, multiword expressions, examples of usage, translations, etc. – of a single entry or sense. The software demonstration includes a brief overview of the API with practical examples of its operation.
This article examines how the most frequent imperative forms of the verb to show in German (zeig mal) and Czech (ukaž) are deployed in object-centred sequences. Specifically, it focuses on smartphone-based showing activities as these were the main sequential environments of show imperatives in the datasets investigated. In both languages, the imperative form does not merely aim to elicit a responsive action from the smartphone holder (such as making the device available) but projects an individual course of action from the requester’s side in the form of an immediate visual inspection of the digital content. This inspection is carried out as part of a joint course of action, allowing the recipient to provide a more detailed response to a prior action. Therefore, this specific imperative form is proven to be cross-linguistically suited to technology-mediated inspection sequences.
Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the system. We deal with this problem by providing users with a GUI which allows them to correct automatically produced imperfect summaries. The corrected summary in turn is added to the pool of training data. The performance of the system is expected to improve as it adapts to the new domain.
Canadian heritage German across three generations: A diary-based study of language shift in action
(2019)
It is well known that migration has an effect on language use and language choice. If the language of origin is maintained after migration, it tends to change in the new contact setting. Often, migrants shift to the new majority language within few generations. The current paper examines a diary corpus containing data from three generations of one German-Canadian family, ranging from 1867 to 1909, and covering the second to fourth generation after immigration. The paper analyzes changes that can be observed between the generations, with respect to the language system as well as to the individuals’ decision on language choice. The data not only offer insight into the dynamics of acquiring a written register of a heritage language, and the eventual shift to the majority language. They also allow us to identify different linguistic profiles of heritage speakers within one community. It is discussed how these profiles can be linked to the individuals’ family backgrounds and how the combination of these backgrounds may have contributed to giving up the heritage language in favor of the majority language.
Meta-communicative practices are generally reflexive in a fairly obvious sense: Inasmuch as speakers use them to talk about or comment on earlier/subsequent talk, they use language self-reflexively. In this paper, we explore a practice that is reflexive not only in this meta-communicative sense but also in a sequential-interactional one: Prefacing a conversational turn with I was gonna say. We show that the I was gonna say-preface furnishes the following general semantic-pragmatic affordances: (1) It retroactively relates the speaker’s subsequent talk to preceding talk from a co-participant, (2) it embodies a claim to prior, now-preempted, communicative intent with regard to what their co-participant has (just) said/done, (3) it therefore displays its speaker’s orientation to the relevance or the appropriate placement of the action(s) done in their own subsequent talk at an earlier moment in the interaction, and (4) it reflexively re-invokes, or retrieves, this earlier moment as the relevant sequential context for their action(s). We then go on to illustrate how speakers draw on these sequentially reflexive affordances for managing recurrent interactional contingencies in specific sequential environments. The paper ends with a discussion of the role that reflexivity plays in and for the deployment of this practice.
In this paper we present work in developing a computerized grammar for the Latin language. It demonstrates the principles and challenges in developing a grammar for a natural language in a modern grammar formalism. The grammar presented here provides a useful resource for natural language processing applications in different fields. It can be easily adopted for language learning and use in language technology for Cultural Heritage like translation applications or to support post-correction of document digitization.
The article addresses Solution-Oriented Questions (SOQs) as an interactional practice for relationship management in psychodiagnostic interviews. Therapeutic alliance results from the concordance of alignment, as willingness to cooperate regarding common goals, and of affiliation, as relationship based upon trust. SOQs particularly allow for both: They are situated at the end of a troublesome topic area, which is linked to low agency on the patient’s side, and they reveal understanding of and interest in the patient. Following the paradigm of Conversation Analysis and German Gesprächsanalyse this paper analyzes the design and functions of SOQs as a means for securing and enhancing the relationship in the process of therapy. Our data comprise 15 videotaped first interviews following the manual of the Operationalized Psychodynamic Diagnostics. The analyses refer to all SOQs found but will be illustrated by means of a single conversation.
Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations
(2009)
In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.
We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.
The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.
In a recent article, Meylan and Griffiths (Meylan & Griffiths, 2021, henceforth, M&G) focus their attention on the significant methodological challenges that can arise when using large-scale linguistic corpora. To this end, M&G revisit a well-known result of Piantadosi, Tily, and Gibson (2011, henceforth, PT&G) who argue that average information content is a better predictor of word length than word frequency. We applaud M&G who conducted a very important study that should be read by any researcher interested in working with large-scale corpora. The fact that M&G mostly failed to find clear evidence in favor of PT&G's main finding motivated us to test PT&G's idea on a subset of the largest archive of German language texts designed for linguistic research, the German Reference Corpus consisting of ∼43 billion words. We only find very little support for the primary data point reported by PT&G.
Lexical resources are often represented in table form, e. g., in relational databases, or represented in specially marked up texts, for example, in document based XML models. This paper describes how it is possible to model lexical structures as graphs and how this model can be used to exploit existing lexical resources and even how different types of lexical resources can be combined.
In this contribution we present some work of the R&D European project “LIRICS” and of the ISO/TC 37/SC 4 committee related to the topic of interoperability and re-use of language resources. We introduce some basic mechanisms of the standardization work in ISO and describe in more details the general approach on how to cope with the annotation of language data within ISO.
Null subjects (NSs) have been a central research topic in generative syntax ever since the 1980s. This chapter considers the situation of German NSs both from a dialectological and from a diachronic perspective and attempts to reconstruct a direct line concerning the licensing conditions of pro-drop from Old High German (OHG) through Middle High German (MHG) and Early New High German (ENHG) to current dialects of New High German (NHG). Particularly, we will argue that German changed from a consistent, yet asymmetric pro-drop language to a partial, but symmetric one. In order to demonstrate that this development took place and the steps involved, we survey the existing empirical evidence and introduce new data.
Action ascription can be understood from two broad perspectives. On one view, it refers to the ways in which actions constitute categories by which members make sense of their world, and forms a key foundation for holding others accountable for their conduct. On another view, it refers to the ways in which we accountably respond to the actions of others, thereby accomplishing sequential versions of meaningful social experience. In short, action ascription can be understood as matter of categorisation of prior actions or responding in ways that are sequentially fitted to prior actions, or both. In this chapter, we review different theoretical approaches to action ascription that have developed in the field, as well as the key constituents and resources of action ascription that have been identified in conversation analytic research, before going on to discuss how action ascription can itself be considered a form of social action.
Action ascription is an emergent process of mutual displays of understanding. Usually, the kind of action that is ascribed to a prior turn by a next action remains implicit. Sometimes, however, actions are overtly ascribed, for example, when speakers expose the use of strategies. This happens particularly in conflictual interaction, such as public debates or mediation talks. In these interactional settings, one of the speakers’ goals is to discredit their opponents in front of other participants or an overhearing audience. This chapter investigates different types of overt strategy ascriptions in a public mediation: exposing the opponent’s use of rhetorical devices, exposing the opponent’s use of false premises, and exposing that an opponent is telling only a half-truth. This chapter shows how speakers use ascriptions of acting strategically as accusations to disclose their opponents’ intentions and ‘truths’ that the opponents allegedly conceal and that are detrimental to their position.
In this chapter, I will focus on the phenomenon of drop out, i.e., withdrawal from the turn due to overlapping talk, in order to reflect on the link between “unfinished” turns and participation framework. With the help of a sequential and multimodal analysis inspired by the conversation analytical approach, I will show that dropping out from a turn is strongly linked to the availability displayed by potential recipients of a turn-at-talk. Although conversation analysis has described in detail the systematics of overlapping talk, especially of its onset (Jefferson 1973, 1983, 1986) and its resolution (Scheg-loff 2000; Jefferson 2004), the phenomenon of withdrawal from a turn due to simultaneous talk has not been investigated in detail. While it seems to bedifficult to describe this interactional practice by referring exclusively to syntactic features (incompleteness of the turn), I suggest looking at turn withdrawal from a multimodal perspective (e.g. Goodwin 1980, 1981; Mondada2007a; Schmitt 2005), taking into account visible resources like gaze or gesture. The problem of continuing or stopping a turn-in-progress in overlapping talk can be closely linked to the participation framework (Goodwin and Goodwin 2004), as speakers do visibly take into account their recipient’s availability and coordinate their turn construction with the dynamic changes of the participation framework and the interactional space.
Drawing on naturalistic video and audio recordings of international meetings, and within the framework of conversation analysis, ethnomethodology and interactional linguistics, this chapter studies how multilingual resources are mobilized in social interactions among professionals, how available linguistic and embodied resources are identified and used by the participants, which solutions are locally elaborated by them when they are confronted with various languages spoken but not shared among them, and which definition of multilingualism they adopt for all practical purposes. Focusing on the multilingual solutions emically elaborated in international professional meetings, we show that the participants orient to a double principle: on the one hand, they orient to the progressivity of the interaction, adopting all the possible resources that enable them to go on within the current activity; on the other hand, they orient to the intersubjectivity of the interaction, treating, preventing and repairing possible troubles and problems of understanding. Specific multilingual solutions can be adopted to keep this difficult balance between progressivity and intersubjectivity; they vary according to the settings, the competences at hand, the linguistic and embodied resources locally defined by the participants as publicly available, the multilingual resources treated as totally or partially shared, as transparent or opaque, and as needing repair or not. The paper begins by sketching the analytical framework, including the methodology and the data collected; it then presents some general findings, before offering an analysis of various ways in which participants keep the balance between progressivity and intersubjectivity in different multilingual interactional contexts.
Based on conference reports and minutes, archive material and official documents, the article seeks to explore the way in which the promotion of women’s sports and of women in leadership positions became an important part of the sport policy of two major organizations involved in European sport cooperation: the Council of Europe and the European Sport Conference. During first and modest discussions in the 1960s and 1970s it constituted a rather paternalistic project. Also, it was based on the assumption of an essential difference between men and women concerning the need for participation in sport. This only changed since the beginning of the 1980s when women took the course in their own hands, challenged the underlying assumptions and created new networks of cooperation.
Since Lerner coined the notion of delayed completion in 1989, this recurrent social practice of continuing one’s speaking turn while disregarding an intermediate co-participant’s utterance has not been investigated with regard to embodied displays and actions. A sequential approach to videotaped mundane conversations in German will explain the occurrence and use of delayed completions. First, especially in multi-party and multi-activity settings, delayed completions can result from reduced monitoring and coordinating activities. Second, recipients can use intra-turn response slots for more extended responsive actions than the current speaker initially projected, leading to delayed completion sequences. Finally, delayed completions are used for blocking possibly misaligned co-participant actions. The investigation of visible action illustrates that delayed completions are a basic practice for retrospectively managing co-participant response slots.
The use of digital resources and tools across humanities disciplines is steadily increasing, giving rise to new research paradigms and associated methods that are commonly subsumed under the term digital humanities. Digital humanities does not constitute a new discipline in itself, but rather a new approach to humanities research that cuts across different existing humanities disciplines. While digital humanities extends well beyond language-based research, textual resources and spoken language materials play a central role in most humanities disciplines.
The ISOcat registry reloaded
(2012)
The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data categories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry’s content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community’s terminological metadata repertoire.
The transfer of research data management from one institution to another infrastructural partner is all but trivial, but can be required, for instance, when an institution faces reorganization or closure. In a case study, we describe the migration of all research data, identify the challenges we encountered, and discuss how we addressed them. It shows that the moving of research data management to another institution is a feasible, but potentially costly enterprise. Being able to demonstrate the feasibility of research data migration supports the stance of data archives that users can expect high levels of trust and reliability when it comes to data safety and sustainability.
The chapter on formats and models for lexicons deals with different available data formats of lexical resources. It elaborates on their structure and possible uses. Motivated by the restrictions in merging different lexical resources based on widely spread formalisms and international standards, a formal lexicon model for lexical resources is developed which is related to graph structures in annotations. For lexicons this model is termed the Lexicon Graph. Within this model the concepts of lexicon entries and lexical structures frequently described in the literature are formally defined and examples are given. The article addresses the problem of ambiguity in those formal terms. An implementation based on XML and XML technology such as XQuery for the defined structures is given. The relation to international standards is included as well.
This paper investigates the long-term diachronic development of the perfect and preterite tenses in German and provides a novel analysis by supplementing Reichenbach’s (1947) classical theory of tense by the notion of underspecification. Based on a newly compiled parallel corpus spanning the entire documented history of German, we show that the development in question is cyclic: It starts out with only one tense form (preterite) compatible with both current relevance and narrative past readings in (early) Old High German and, via three intermediate stages, arrives at only one tense form again (perfect) compatible with the same readings in modern Upper German dialects. We propose that in order to capture all attested stages we must allow tenses to be unspecified for R (reference time), with R merely being inferred pragmatically. We then propose that the transitions between the different stages can be explained by the interplay between semantics and pragmatics.
Mock fiction is a genre of humorous, fictional narratives. It is pervasive in adolescents’ peer-group interaction. Building on a corpus of informal peer-group interaction among 14 to 17 year-old German adolescents, it is shown how mock fiction is used to sanction identity-claims of peer-group co-members that are taken to be inadequate by the teller of a mock fiction. Mock fiction exposes and ridicules those claims by fictional exaggeration. Mock fiction is an indirect, yet sometimes even highly abusive means for criticizing and negotiating identities and statuses of peer-group members. The analysis shows how mock fiction is collaboratively produced, how it is used to convey criticism and to negotiate social norms indirectly, and how, in addition, it allows for performative self-positioning of the tellers as skilled, entertaining tellers and socio-psychological diagnosticians.
In this paper, we present an overview of freely available web applications providing online access to spoken language corpora. We explore and discuss various solutions with which the corpus providers and corpus platform developers address the needs of researchers who are working with spoken language. The paper aims to contribute to the long-overdue exchange and discussion of methods and best practices in the design of online access to spoken language corpora.
Between January 2020 and summer 2021, many new words and phrases contributed to the expansion of the German vocabulary in order to enable communication under the new conditions during the corona pandemic. This rapid expansion of vocabulary has most notably affected lexicography as a discipline of applied linguistics. General language dictionaries or terminological dictionaries have quickly reflected on how the German lexicon evolved during the corona pandemic: new entries were added, others were revised. This paper, however, focuses on the ways in which a German (specialized) neologism dictionary project, the "Neologismenwörterbuch" at the "Leibniz Institute for the German Language, Mannheim" published (online, see https://www.owid.de/docs/neo/start.jsp) has chosen to capture and document lexicographic information in a timely manner. Neologisms are (following the definition applied here) lexical units or senses/meanings which emerge in a language community over a specific period of time of language development, which diffuse, are generally accepted as language norms, and which the majority of speakers perceive as new for some time. Thus, the "Neologismenwörterbuch" used to record neologisms only retrospectively, that is after their lexicalization. As a consequence, users of the dictionary were often not able to obtain details on words that were particularly conspicuous at a particular time in a specific discourse, thus raising questions concerning their meaning, correct spelling, etc. This, however, did not imply that the lexicographers of the project had not already collected these words with some preliminary information in a list of candidates for inclusion in an internal database. Therefore, the project started to publish online an index of monitored words including lexical units that had emerged since 2011, for which only time will tell whether they will diffuse and manifest as language norms. This list format was used since April 2020 to also issue a compilation of corona-related neologisms as part of the "Neologismenwörterbuch". In October 2021, this inventory included more than 1.800 Corona-related neologisms, and still, more than 700 candidates in an internal database awaited lexicographic description and inclusion into the online index (see https://www.owid.de/docs/neo/listen/corona.jsp). In this paper many examples are presented to illustrate how new words, new senses and new uses in the context of the Covid-19 pandemic are reflected in the dictionary.
We present zu-excessive structures like Otto ist zu schwer ‘Otto is too heavy’ as instantiations of comparatives that have been reflexivized. Comparatives express asymmetric relations between distinguished referents, but reflexivization identifies argument places (or reduces two argument places to one), leading to a Symmetrie relation. Reflexivization is thus in conflict with the asymmetry property of comparatives and leads to an intermediate semantic representation that is con- tradictory. Two experiments substantiate that zu-excessives share this property with privative adjective and animal-for-statue constructions that similarly give rise to contradictory semantics. The processing of any of the constructions mentioned yields a positivity in the event-related-potential signature characteristic of concep- tual reorganization; however, the observed positivity occurs earlier in the case of zu-excessives than in the other cases. We propose this difference is due to zu signalling the mandatory preparation for an ensuing repair rather than reflecting the repair Operation itself that involves manipulating the Standard of comparison, coded elsewhere in the String (if at all).
In this paper we examine the composition and interactional deployment of suspended assessments in ordinary German conversation. We define suspended assessments as lexicosyntactically incomplete assessing TCUs that share a distinct cluster of prosodic-phonetic features which auditorily makes them come off as 'left hanging' rather than cut-off (e.g., Schegloff/Jefferson/Sacks 1977; Jasperson 2002) or trailing-off (e.g., Local/Kelly 1986; Walker 2012). Using CA/IL methodology (Couper-Kuhlen/Selting 2018) and drawing on a large body of video-recorded face-to-face conversations, we highlight the verbal, vocal and bodily-visual resources participants use to render such unfinished assessing TCUs recognizably incomplete and identify six recurrent usage types. Overall, the suspension of assessing TCUs appears to either serve as a practice for circumventing the production of assessments that are interactionally inapposite, or as a practice for coping with local contingencies that render the very doing of an assessment problematic for the speaker. Data are in German with English translations.
Digital humanities research under United States and European copyright laws. Evolving frameworks
(2021)
This chapter summarizes the current state of copyright laws in the United States and European Union that most affect Digital Humanities research, namely the fair use doctrine in the US and research exceptions in Europe, including the Directive on Copyright in the Digital Single Market, which has been finally adopted in 2019. This summary begins with a description of recent copyright advances most relevant to DH research, and finishes with an analysis of a significant remaining legal hurdle which DH researchers face: how do fair use and research exceptions deal with the critical issue of circumventing technological protection measures (TPM, a.k.a. DRM). Our discussion of the lawful means of obtaining TPM-protected material may contribute to both current DH research and planning decisions and inform future stakeholders and lawmakers of the need to allow TPM circumvention for academic research.
The General Data Protection Regulation (GDPR) on personal data protection in the European Union entered into application on 25 May 2018. With its 173 recitals and 99 articles, it may be one of the most ambitious pieces of EU legislation to date. Rather than a guide to GDPR compliance for Digital Humanities researchers, this chapter looks at the use of personal data in DH projects from the data subject’s perspective, and examines to what extent the GDPR kept its promise of enabling the data subject to “take control of his data”. The chapter provides an overview of the right to privacy and the right to data protection, a discussion of the relation between the concept of data control and privacy and data protection law, an introduction to the GDPR, and an explanation of its relevance for scientific research in general and DH in particular. The main section of the chapter analyses two types of data control mechanisms (consent and data subject rights) and their impact on DH research.
Information theory can be used to assess how efficiently a message is transmitted on the basis of different symbolic systems. In this paper, I estimate the information-theoretic efficiency of written language for parallel text data in more than 1000 different languages, both on the level of characters and on the level of words as information encoding units. The main results show that (i) the median efficiency is ∼29% on the character level and ∼45% on the word level, (ii) efficiency on both levels is strongly correlated with each other and (iii) efficiency tends to be higher for languages with more speakers.
Mobile live video streaming with smartphones is an everyday media practice in which the participants are in a specific multimodal constellation and streamers and viewers have access to various semiotic resources for interactionally establishing alignment. Based on the multimodal sequence analysis of a concise episode of a journalist's livestream coverage of a political event on the streaming platform Periscope, I will address the question of how participation and involvement in live video streams are achieved and organised by the participants. I will show that hosts in the media practice of live video streaming act in an interaction-dominant manner and involve the viewers in the situation through asymmetrical participation coordination via footing shifts.
This study builds on a large body of work on the use of linguistic forms for requests in social interaction. Using Conversation Analysis / Interactional Linguistics, this study explores the use of two recurrent linguistic formats for requesting in spoken German – simple interrogatives ('do you do ..?') and kannst du VP? ('can you do..?') interrogatives. Based on a corpus of video-recorded, naturally occurring data of mundane data, this study demonstrates one of the interactional factors that is relevant for the choice between alternative interrogative request formats in spoken German – recipient's embodied availability before and during the request initiation. It is shown that simple interrogatives are used to request an action from a recipient who is either available or involved in their own project, which, however, does not have to be suspended or interrupted for the compliance with the request. In contrast, kannst du VP? interrogatives occur in environments in which the recipient is already engaged in a project that must be suspended in order to grant the request.
This chapter focuses on the contributions of German scholars to two of the three main research questions that have defined EU studies. Leaving aside the debate on the drivers of European integration, i.e. European integration theory, we will discuss the «governance turn» Fritz Scharpf, Beate Kohler-Koch, Arthur Benz, Ingeborg Tömmel and others promoted in studying EU institutions as well as the more policy-oriented approaches by Adrienne Héritier and again Fritz Scharpf and their students. We will then address the ever-growing literature on Europeanization on how EU policies, institutions and political processes have been affecting the domestic structures of member states, membership candidates, as well as neighborhood and third countries. In this context, German scholars also contributed to EU studies in what could be coined in methodological rather than substantial terms. Whereas Thomas König, Gerald Schneider, and others promoted the application of quantitative approaches, scientists like Bernhard Ebbinghaus and Markus Haverland dealt with general questions on research designs like case selection and causal inference. Finally, we will also discuss German contributions to diffusion research. The European Union as a most likely case for the diffusion of policies has attracted considerable attention by scholars dealing with the question of when and how policies spread across time and space. So it comes as no surprise that EU studies as well as diffusion research mutually benefitted from each other. In this regard, German scientists like Katharina Holzinger, Christoph Knill, Tanja Börzel, Thomas Plümper, Thomas Risse and others played a prominent role, too.