Refine
Year of publication
Document Type
- Part of a Book (42)
- Article (32)
- Conference Proceeding (28)
- Book (4)
- Doctoral Thesis (4)
- Course Material (1)
- Working Paper (1)
Language
- English (112) (remove)
Keywords
- Englisch (112) (remove)
Publicationstate
- Veröffentlichungsversion (52)
- Postprint (12)
- Zweitveröffentlichung (9)
- Erstveröffentlichung (1)
- Preprint (1)
Reviewstate
- Peer-Review (35)
- (Verlags)-Lektorat (22)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (4)
- Peer-review (2)
- Peer review (1)
- Peer-Revied (1)
- Review-Status-unbekannt (1)
Publisher
It is a ubiquitous phenomenon of everyday interaction that participants confront their co-participants for behaviour that they assess as undesirable or in some other way untoward. In a set of video data of informal interaction from the PECII corpus (Parallel European Corpus of Informal Interaction), cases of such sanctions have been collected in English, German, Italian and Polish data. This study presents work in progress and focuses on interrogatively formatted sanctions, in particular on non-polar interrogatives. It has already been shown that interrogatives can do much more than ask questions (Huddleston 1994). They can also function as directives (Lindström et al. 2017) or, more specifically, as requests (Curl/Drew 2008), as invitations (Margutti/Galatolo 2018) or reproaches (Klattenberg 2021), among others. What makes them interesting for cross-linguistic comparison is that the four languages that are considered provide different morphological and (morpho-)syntactical ressources for the realization of interrogative phrases. For example, German provides the option of building in the modal particle denn that reveals a previous lack of clarity and obliges the co-participant(s) to deliver the missing information (Deppermann 2009). Of course, the other three languages have modal particles, too (e.g. allora in Italian or though in English), but they do not seem to convey the same semantic and interactional qualities as denn. From an interactional point of view, one could think that interrogatives are a typical and effective way of solliciting accounts, since formally they open up a conditionally relevant space for an answer or a
reaction. But as the data shows, this does not guarantee that they are actually responded to. Another relevant aspect in the context of sanctions is that the interrogative format seems to carry a certain ‚openness‘ that might be seen as a mitigating effect and thus provides an interesting point of comparison with other mitigating devices. This study uses the methods of conversation analysis and interactional linguistics. It is based on a collection of 148 interrogative sanctions (out of which 84 are non-polar interrogatives) covering the four languages. I draw on coded data from roughly 1000 cases to get a first overall idea of how the interrogative format might differ from other formats, and how it might interrelate with specific features – for example, if subsequently an account is delivered. Going more into depth, the interrogative sanctions will then be analyzed with respect to their formal design (e.g. polar questions vs. content questions vs. tag questions, Rossano 2010; Hayano 2013) and to their pragmatic implications. I also analyze reactions to such sanctions – both formally (cf. Enfield et al. 2019, 279) and, again, from an interactional perspective (e.g. acceptance/compliance vs. challenging/defiance; Kent 2012; Cekaite 2020). A more detailed zooming in on the sequential unfolding of some particularly interesting
instances of sanctioning interrogatives will make the picture complete.
In G, E, I, and H there are constructions with accusative NPs being the external argument of an infinitival, (1) to (4). In P these accusative NPs can only co-occur with an adjectival participle, (5), a construction also occurring in E, (6). The talk compares the syntactic and semantic structure of these constructions focussing on the syntactic category of the nonfinite clause, the status of the accusative NP, the status of the infinitive, restructuring effects, and embedding predicates (including aspect).
i. As to G, E, I, and H, the infinitival clause is regarded as a TP, i.e., a small clause. Its accusative NP and infinitival predicate form a unit – [4], [12], [8]. The AcI denotes, according to [4], an eventuality, which prevents it from being negated. Its subject is case marked by the matrix predicate, either by ECM or subject-to-object raising – [9] and [10]. AcI-constructions can show clause union effects, (7). H additionally allows Dative subjects in infinitive clauses, the latter only being licensed by impersonal predicates and co-occurring with an agreeing infinitive, (8a), – [3]. In case there is no agreeing infinitive, the Dative NP is the experiencer of the matrix clause, (8b). As for Italian, it allows Nominative subject NPs in the infinitive clause, (9a, b).
ii. As to P, small clause constructions differ structurally from E, G, I and H ones – [6], [7]. P small clauses are realizable by copula constructions with verbal być ‘be’ pronominal to ‘it’, (10), or “dual” copula elements, (cooccurrence of a pronominal and a verbal element, [1]), varying with respect to selectional restrictions (part of speech or case within complement phrases, extraction possibilities, [1]). The P counterpart to the AcI-constructions is the secondary predication over an accusative object via an adjectival present participle, (5), (11) and (12). The adjectival participle construction is systematically paraphrasable via clauses introduced by jak ‘how’ (11’) and (12’). In Polish, adjectival phrases like recytującego wiersz ‘reciting’, (11), and wracającego z podróży ‘returning’, (12), clearly function as adjuncts of the accusative object go ‘him’. In our talk, we will compare this P view to languages with typical AcI-constructions, where the AcI-clause is standardly analyzed as a complement of a matrix verb.
The issue: We discuss (declarative) prepositional object clauses (PO-clauses) in the West Germanic languages Dutch (NL), German (DE), and English (EN). In Dutch and German, PO-clauses occur with a prepositional proform (=PPF, Dutch: ervan, erover, etc.; German: drauf/darauf, drüber/darüber, etc.). This proform is optional with some verbs (1). In English, by contrast, P embeds a clausal complement in the case of gerunds or indirect questions (2), however, P is obligatorily absent when the embedded CP is a that-clause in its base positionv(3a). However, when the that-clause is passivized or topicalized, the stranded P is obligatory (3b). Given this scenario, we will address the following questions: i) Are there structural differences between PO-clauses with a P/PPF and those in which the P/PPF is optionally or obligatorily omitted? ii) In particular, do PO-clauses without P/PPF structurally coincide with direct object (=DO) clauses? iii) To what extent are case and nominal properties of clauses relevant? We use wh-extraction as a relevant test for such differences.
Previous research: Based on pronominalization and topicalization data in German and Dutch, PO-clauses are different from DO-clauses independent of the presence of the PPF (see, e.g., Breindl 1989; Zifonun/Hoffmann/Strecker 1997; Berman 2003; Broekhuis/Corver 2015 and references therein) (4,5). English pronominalization and topicalization data (3b) appear to point in the same direction (Fischer 1997; Berman 2003; Delicado Cantero 2013). However, the obligatory absence of P before that-clauses in base position indicates a convergence with DO-clauses.
Experimental evidence: To provide further evidence to these questions we tested PO-clauses in all three languages for long wh-extraction, which is usually possible for DO-clauses in English and Dutch, and in German for southern regional varieties. For German and Dutch we conducted rating studies using the thermometer method (Featherston 2008). Each study contained two sets of sentences: the first set tested long wh-extraction with regular DO-clauses (6). The second set tested wh-extraction from PO-clauses with and without PPFs (7), respectively. The results show no significant difference in extraction with PO-clauses whether or not the PPF was present even for those speakers who otherwise accept long-distance extraction in German. This supports a uniform analysis of PO-clauses with and without the PPF in contrast to DO-clauses. For English we tested extraction with verbs that select for PP-objects in two configurations: V+that-clause and V+P-gerund (8) in comparison to sentences without extraction. Participants rated sentences on a scale of 1 (unnatural) to 7 (natural). We included the gerund for English as this is a regular alternative for such objects. The results show that extraction is licit in both configurations. This suggests that English PO-clauses are different from German and Dutch PO-clauses: They rather behave as DO-clauses allowing for extraction. Note though, that the availability of extraction from P+gerund also shows that PPs are not islands for extraction in English. Overall, this shows that there is a split between English vs. German/Dutch PO-clauses when the P/PPF is absent. While these clauses behave like PO-clauses in the latter languages, extraction does not show a difference between DO- and PO-clauses in English. We will discuss the results in relation to the questions i)–iii) above.
This study investigates other-initiated repair and its embodied dimension in casual English as lingua franca (ELF) conversations, thereby contributing to the further understanding of multimodal repair practices in social interaction. Using multimodal conversation analysis, we focus on two types of restricted other-initiation of repair (OIR): partial repeats preceded or followed by the question word what (i.e., what X?/X what?) and copular interrogative clauses (i.e., what is X). Partial repeats with what produced with rising final intonation are consistently accompanied by a head poke and treated as relating to troubles in hearing, with the repair usually consisting of a repeat. In contrast to these partial repeats, copular interrogative clauses are produced with downward final intonation and accompanied by face-related embodied conduct. The what is X OIRs primarily target code-switched lexical items, the understanding of which is critical for maintaining the repair initiator’s involvement in the ongoing sequence. This study also contributes some general reflections on the possible complexity of OIR and repair practices from a multimodal perspective.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
Our current era of globalization is characterized above all by increased mobility, namely by the increasing mobility of people and the development of new communication technologies, including the mobility of linguistic signs and resources. This process raises new theoretical and methodological questions in linguistics, which results in the development of a new sociolinguistics of globalization (Blommaert 2010) in recent years. One of the most obvious ways to trace this new and dynamic development is to analyze individual language repertoires, especially those of migrants. In this essay, I examine aspects of the communicative repertoire of a refugee who fled to Germany in 2015 to escape the civil war in Syria. I draw on two interviews I conducted with him (in the following I refer to him by the pseudonym „Baran“). The first interview with Baran was recorded in 2016, a few months after his arrival in Germany. The second interview is from 2023, seven years later. In both recordings, German was the dominant language of interaction. I will analyze and show the characteristics of his German at the beginning of his immigration, how he resorts to practices of language mixing between German, Turkish and English (which has recently also been referred to as translanguaging) and how his German has developed over the course of the past seven years.
The present paper examines the rise and fall of Modern High German loanwords in English from 1600 until 2000, principally making use of the record of borrowing documented by the Oxford English Dictionary (OED) in its Third Edition (online version, in revision 2000-). Groups of loanwords are analysed by century, with reference to the changing social and cultural landscape characterising relationships between the relevant nations over this period. This is not a simple picture: each language grows over the period in different ways, and the speakers of English look to German at different times for different types of borrowing, as the political and intellectual balance alters.
This chapter explores the Linguistic Landscape of six medium-size towns in the Baltic States with regard to languages of tourism and to the role of English and Russian as linguae francae. A quantitative analysis of signs and of tourism web sites shows that, next to the state languages, English is the most dominant language. Yet, interviews reveal that underneath the surface, Russian still stands strong. Therefore, possible claims that English might take over the role of the main lingua franca in the Baltic States cannot be maintained. English has a strong position for attracting international tourists, but only alongside Russian which remains important both as a language of international communication and for local needs.
Picnick and Sauerkraut: German–English intra-writer variation in script and language (1867–1900)
(2023)
Intra-writer variation is a wide-spread phenomenon that nevertheless has received only limited research attention so far. Different addressees, bi- and multilingualism, or changing life phases are among the factors that contribute to such variation. In a study of diary entries by one writer covering three decades (1867–1900), this chapter investigates patterns of intra-writer variation between German and English (language and script) in nineteenth-century Canada, with a special focus on single word borrowings, person reference and place names. The long-term perspective provides a unique insight into the dynamics of a bilingual writer’s emerging sociolinguistic competence as reflected by the flexible yet structured use of his resources within the social space of a bilingual community.
The present research unites two emergent trends in the area of language attitudes: (a) research on perceptions of nonnative speakers by nonnative listeners and (b) the search for general, basic mechanisms underlying the evaluation of nonnative accented speakers. In three experiments featuring an employment situation, German participants listened to a presentation given in English by a German speaker with a strong versus native-like accent (in Studies 1–3) versus a native speaker of English (in Study 1). They evaluated candidates with a strong accent worse than candidates with a native(-like) pronunciation—even to the degree that the quality of arguments was of no relevance (Study 1). Study 2 introduces an effective intervention to reduce these discriminatory tendencies. Across studies, affect and competence emerged as major mediators of hirability evaluations. Study 3 further revealed sequential indirect influences, which advance our understanding of previous inconsistent findings regarding disfluency and warmth perceptions.
This paper seeks to apply the principles of the famous 3-Circle-Model devised for the description of the ecolinguistic position of English world-wide to the position of German around the world.
On the one hand, the 3-Circle-Model for English with its "Inner", "Outer" and "Extended/Expanding" Circles was invented by Kachru in the 1980s and has since then been adopted, refined and criticised by numerous authors. The situation of German world-wide, on the other hand, has only been scarcely discussed in the past 20 years. While the global extension of German is obviously by far weaker than that of English, there are also a number of noteworthy similarities in terms of historical spread and the current position of these two languages.
This paper therefore discusses the analogies of global English and German by establishing three circles for German: the Inner Circle for the core German-speaking area, i.e. Germany, Austria and Switzerland; the Outer Circle including a number of German minority areas (mostly in Europe), and finally the Extended Circle which may be denoted as "Crumbling" rather than "Expanding". The latter comprises traditional German diaspora communities in different parts of the world which either result from migration, but also reflect the previous functions of German as a language of culture and as a lingua franca in regions like Eastern Europe. The paper argues that there are some striking structural similarities, but also shows the limits of this comparison.
This chapter introduces readers to the context and concept of this volume. It starts by providing an historical overview of languages and multilingualism in Lithuania, Estonia and Latvia, highlighting the 100th anniversary of statehood which the three Baltic states are celebrating in 2018. Then, the chapter briefly presents important strands of research on multilingualism in the region throughout the past decades; in particular, questions about language policies and the status of the national languages (Estonian, Latvian and Lithuanian) and Russian. It also touches on debates about languages in education and the roles of other languages such as the regional languages of Latgalian and Võro and the changing roles of international languages such as English and German. The chapter concludes by providing short summaries of the contributions to this book.
Thesauri have long been recognized as valuable structured resources aiding Information Retrieval systems. A thesaurus provides a precise and controlled vocabulary which serves to coordinate data indexing and retrieval. The paper presents a bilingual Greek and English specialized thesaurus that is being developed as the backbone of a platform aimed at enhancing and enriching the cultural experiences of visitors in Eastern Macedonia and Thrace, Greece. The cultural component of the intended platform comprises textual data, images of artifacts and living entities (animals and plants in the area), as well as audio and video. The thesaurus covers the domains of Archaeology, Literature, Mythology, and Travel; therefore, it can be viewed as a set of inter-linked thesauri. Where applicable, terms and names in the database are also geo-referenced.
In English, past tense stative clauses embedded under a past-marked attitude verb, like Eric thought that Kalina was sick, can receive two interpretations, differing on when the state of the complement is understood to hold, i.e. Kalina’s sickness precedes the time of Eric’s thinking (backward-shifted reading), or Kalina is sick at the time of Eric’s thinking (simultaneous reading). As is well known, the availability of the simultaneous reading—also called Sequence of tense (SOT)—is subject to cross-linguistic variation. Non-SOT languages only allow for the backward-shifted interpretation. This cross-linguistic variation has been analysed in two main ways in the literature: a structural approach, connecting the availability of the simultaneous reading in a language to a syntactic mechanism that allows the embedded past not to be interpreted; and an implicature approach, which links the absence of such a reading to the presence of a “cessation” implicature associated with past tense. We report a series of experiments on Polish, which is commonly classified as a non-SOT language. First, we investigate the interpretation of complement clauses embedded under past-marked attitude verbs in Polish and English. This investigation revealed a difference between these two languages in the availability of simultaneous interpretations for past-under-past complement clauses, albeit not as large as a binary distinction between SOT and non-SOT languages would lead us to expect. We then address the question of whether the lower acceptability we observe for simultaneous readings in Polish might be due to an embedded cessation implicature. On the way to address this question, we show that in simple matrix clauses, Polish gives rise to the same cessation inference as English. Then we investigate Polish past-under-past sentences in positive and negative contexts, comparing their potential cessation implicature to the exclusive implicature of disjunction. In our results, we found that the latter was endorsed more often in positive than in negative contexts, as expected, while the cessation implicature was endorsed overall very little, with no difference across contexts. The disanalogy between the disjunction and the temporal cases, and the insensitivity of the latter to monotonicity, are a challenge for the implicature approach, and cast doubts on associating SOT phenomena with implicatures.
In this article we examine moments in which parents or other caregivers overtly invoke rules during episodes in which they take issue with, intervene against, and try to change a child’s ongoing behavior or action(s). Drawing on interactional data from four different languages (English, Finnish, German, Polish) and using Conversation Analytic methods, we first illustrate the variety of ways in which parents may use such overt rule invocations as part of their behavior modification attempts, showing them to be functionally versatile interactional objects. Their interactional flexibility notwithstanding, we find that parents typically invoke rules when, in the course of the intervention episode, they encounter trouble with achieving an acceptable compliant outcome. To get at the distinct import of rule formulations in this context, we then compare them to two sequential alternatives: parental expressions of an experienced negative affective state, and parental threats. While the former emphasize aspects of social solidarity, the latter seek to enforce compliance by foregrounding a power asymmetry between the parent and the child. Rule formulations, by contrast, are designedly impersonal and appear to be directed at what the parents construe as shortcomings in common-sense practical reasoning on the child’s part. Reflexively, the child is thereby cast as not having properly applied common-sense ‘practical reason’ when engaging in what is treated as the problematic behavior or action. Overt rule invocations can, therefore, be understood as indexical appeals to practical reason.
In the present contribution, I investigate if and how the English and French editions of the Wiktionary collaborative dictionary can be used as a corpus for real time neology watch. This option is envisaged as a stopgap, when no satisfactory corpus is available. Wiktionary can also prove useful in addition to standard corpus analysis, to minimize the risk of overlooking new coinages and new senses. Since the collaborative dictionary’s quest for exhaustiveness makes the manual inspection of the new additions unreasonable (more than 31,000 English lemmas and 11,000 French lemmas entered the nomenclature in 2020), identifying the possibly relevant headwords is an issue. The solution proposed here is to use Wiktionary revision history to detect the (new or existing) entries that received the greatest number of modifications. The underlying hypothesis is that the most heavily edited pages can help identify the vocabulary related to “hot topics”, assuming that, in 2020, the pandemic-related vocabulary ranks high. I used two measures introduced by Lih (2004), whose aim was to estimate the quality of Wikipedia articles: the so-called rigour (number of edits per page) and diversity (number of unique contributors per page). In the present study, I propose to adapt the rigour and diversity metrics to Wiktionary in order to identify the pages that generated a particular stir, rather than to estimate the quality of the articles. I do not subscribe to the idea that – in Wiktionary – more revisions necessarily produce quality articles (more revisions often produce complete articles). I therefore adopt Lih’s notion of diversity to refer to the number of distinct contributors, but leave out the name rigour when it comes to the number of revisions. Wolfer and Müller-Spitzer (2016) used the two metrics to describe the dynamics of the German and English editions of Wiktionary. One of their findings was that the number of edits per page is correlated with corpus word frequencies. The variation in number of page edits should therefore reflect to some extent the variation of corpus word frequencies. Renouf (2013) established a relationship between the fluctuation of word frequencies in a diachronic corpus and various neological processes. In particular, she illustrated how specific events generate sudden frequency spikes for words previously unseen in the corpus. For instance, Eyjafjallajökull, the – existing – name of an Icelandic glacier, appeared in the corpus when the underlying volcano erupted in 2010 and disrupted air traffic in Europe. In order to check if the same phenomenon occurs when using Wiktionary edits instead of corpus frequencies, I manually annotated the most frequently revised entries (according to various ranking scores) with the binary tag: “related to Covid-19” (yes/no). The annotations were then used to test the ability of various configurations to detect relevant headwords from the English and French Wiktionary, namely Covid-19 neologisms and related existing words that deserve updates.
This paper presents the main issues connected with the creation of a trilingual Hungarian-Italian-English dictionary of the COVID-19 pandemic using Lexonomy. My aim is not only to create a coronacorpus (in Hungarian, I propose my own corona-neologism or ‘coroneologism’: koronakorpusz) and a dictionary of equivalents, but also to understand how the different waves and phases of the COVID-19 pandemic are changing the Hungarian language, detect the Corona-, COVID-, pandemic-, virus-, mask-, quarantine-, and vaccine-related neologisms, and offer an overview of the most frequent or linguistically interesting Hungarian neologisms and multiword units related to COVID-19.
Since the beginning of 2020, the Covid-19 pandemic has dominated public discourse and introduced a wealth of words and expressions to the general vocabulary of English and other world languages. The lexical adaptation necessitated by this global health crisis has been unprecedented in speed and scope, and in response, the Oxford English Dictionary (OED) has continually revised its coverage, publishing special updates of Covid-19-related words in 2020 outside of its usual quarterly publication cycle. This article describes how OED lexicographers have analysed language corpora and other text databases to monitor the development of pandemic-related words and provide a linguistic and historical context to their usage.
The shortening of linguistic expressions naturally involves some sort of correspondence between short forms and (some portion of) the respective full forms. Based mostly on data from English and Hebrew this article explores the hypothesis that such correspondence concerns necessary sameness of symbolic form, referring either to graphemic or to a specific level of phonological representation. That level indicates a degree of abstractness defined by language-specific contrastiveness (i.e. “phonemic”). Reference to written form can be shown to be highly systematic in certain contexts, including cases where full forms consist of multiple stems. Specific asymmetries pertaining to the targeting of material by correspondence (e.g. initial vs. non-initial position) appear to be alike for both types of representation, a claim supported by a study based on a nomenclature strictly confined to writing (chemical element symbols).
Dictionaries have been part and parcel of literate societies for many centuries. They assist in communication, particularly across different languages, to aid in understanding, creating, and translating texts. Communication problems arise whenever a native speaker of one language comes into contact with a speaker of another language. At the same time, English has established itself as a lingua franca of international communication. This marked tendency gives lexicography of English a particular significance, as English dictionaries are used intensively and extensively by huge numbers of people worldwide.
The thesis describes a fully automatic system for the resolution of the pronouns 'it', 'this', and 'that' in English unrestricted multi-party dialog. Referential relations considered include both normal NP-antecedence as well as discourse-deictic pronouns. The thesis contains a theoretical part with a comprehensive empiricial study, and a practical part describing machine learning experiments.
Over the past decade, conducting empirical research in linguistics has become increasingly popular. The first of its kind, this book provides an engaging and practical introduction to this exciting versatile field, providing a comprehensive overview of research aspects in general, and covering a broad range of subdiscipline-specific methodological approaches. Subfields covered include language documentation and descriptive linguistics, language typology, corpus linguistics, sociolinguistics and anthropological linguistics, cognitive linguistics and psycholinguistics, and neurolinguistics. The book reflects on the strengths and weaknesses of each single approach and on how they interact with one-another across the study of language in its many diverse facets. It also includes exercises, example student projects and recommendations for further reading, along with additional online teaching materials. Providing hands-on experience, and written in an engaging and accessible style, this unique and comprehensive guide will give students the inspiration they need to develop their own research projects in empirical linguistics.
The teaching slides accompany the following textbook:
Svenja Völkel & Franziska Kretzschmar (2021): Introducing linguistic research. Cambridge: Cambridge University Press.
The slides follow the structure of the book chapters and can be used for teaching in class. They include the basic information per chapter and exercises to work on in class or as homework. More detailed information, additional exercises, suggestions for research projects and recommendations for further reading can be found in the textbook.
We present zu-excessive structures like Otto ist zu schwer ‘Otto is too heavy’ as instantiations of comparatives that have been reflexivized. Comparatives express asymmetric relations between distinguished referents, but reflexivization identifies argument places (or reduces two argument places to one), leading to a Symmetrie relation. Reflexivization is thus in conflict with the asymmetry property of comparatives and leads to an intermediate semantic representation that is con- tradictory. Two experiments substantiate that zu-excessives share this property with privative adjective and animal-for-statue constructions that similarly give rise to contradictory semantics. The processing of any of the constructions mentioned yields a positivity in the event-related-potential signature characteristic of concep- tual reorganization; however, the observed positivity occurs earlier in the case of zu-excessives than in the other cases. We propose this difference is due to zu signalling the mandatory preparation for an ensuing repair rather than reflecting the repair Operation itself that involves manipulating the Standard of comparison, coded elsewhere in the String (if at all).
Taking the use of the esthetic term wabi sabi (Japanese compound noun) in a series of German- and English-language theater rehearsals as an example, this article studies the emergence of shared meanings and uses of an expression over an interactional history. We track how shared understandings and uses of wabi sabi develop over the course of a series of theater rehearsals. We focus on the practices by which understandings of wabi sabi are displayed, adopted, and negotiated. We discuss complexities and intransparencies of the manifestation of common ground in multiparty interactions and its relationship to the emergence of routine uses of the expression. Data are in English and German with English translation.
We present empirical evidence of the communicative utility of conventionalization, i.e., convergence in linguistic usage over time, and diversification, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexical-semantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.
Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
(2020)
We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of “scientific language” and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.
This study investigates the question of whether the processing of complex anaphors require more cognitive effort than the processing of NP-anaphors. Complex anaphors refer to abstract objects which are not introduced as a noun phrase and bring about the creation of a new discourse referent. This creation is called “complexation process”. We describe ERP findings which provide converging support for the assumption that the cognitive cost of this complexation process is higher than the cognitive cost of processing NP-anaphors.
Construction-based language models assume that grammar is meaningful and learnable from experience. Focusing on five of the most elementary argument structure constructions of English, a large-scale corpus study of child-directed speech (CDS) investigates exactly which meanings/functions are associated with these patterns in CDS, and whether they are indeed specially indicated to children by their caretakers (as suggested by previous research, cf. Goldberg, Casenhiser and Sethuraman 2004). Collostructional analysis (Stefanowitsch and Gries 2003) is employed to uncover significantly attracted verb-construction combinations, and attracted pairs are classified semantically in order to systematise the attested usage patterns of the target constructions. The results indicate that the structure of the input may aid learners in making the right generalisations about constructional usage patterns, but such scaffolding is not strictly necessary for construction learning: not all argument structure constructions are coherently semanticised to the same extent (in the sense that they designate a single schematic event type of the kind envisioned in Goldberg’s [1995] ‘scene encoding hypothesis’), and they also differ in the extent to which individual semantic subtypes predominate in learners’ input
In an earlier publication it was claimed that there is no useful relationship between Swahili-English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years’ worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side.
Since 2013 representatives of several French and German CMC corpus projects have developed three customizations of the TEI-P5 standard for text encoding in order to adapt the encoding schema and models provided by the TEI to the structural peculiarities of CMC discourse. Based on the three schema versions, a 4th version has been created which takes into account the experiences from encoding our corpora and which is specifically designed for the submission of a feature request to the TEI council. On our poster we would present the structure of this schema and its relations (commonalities and differences) to the previous schemas.
This paper discusses new perspectives for a usage-based paremiology from a corpus-linguistic point of view. Using the example of proverb patterns, it shows different degrees of fixedness and proverb quality in German-English contrast. An interesting insight is that proverb similarities and differences can also be described by restrictions of semi-abstract schemes.
This paper discusses a specific subclass of English it-clefts posited in the theoretical literature, so-called predicational clefts. The main point of the paper is to show that there is no need to postulate such a separate class. Predicational clefts look special because of the narrow focus on the adjective within an indefinite pivot, but their special properties can all be derived from this narrow focus in a focus analysis in which it-clefts express contrasting focus. Contrasting focus means that besides the assertion of the proposition expressed in the cleft, there is one contrasting proposition which is excluded. The focus on the adjective in apparent predicational clefts gives rise to a narrow set of relevant alternatives, all of which differ only in the adjectival property within the pivot. The analysis developed here can account for many of the observations for apparent predicational clefts. Other properties are shown to be not conclusive. Thus, predicational clefts need not be considered a special subclass beyond their special focus characteristics.
Language shift after migration has been reported to occur within three generations. While this pattern holds in many cases there is also some counter evidence. In this paper, family documents from a German immigration community in Canada are investigated to trace individual decisions of language choice that contributed to an extended process of shift taking four generations and more than a century.
The puzzle we consider in this paper is that Merchant (2004) judges certain elliptical utterances in context to be ungrammatical, while Culicover and Jackendoff (2005) judge similar examples to be grammatical. The main difference between the examples appears to be that Merchant’s are introduced by no, while Culicover and Jackendoff’s are introduced by yes. We propose that the different judgments do not reflect grammaticality, but complexity associated with ambiguity. First, there is an ambiguity with respect to the reference of noun phrases in discourse: the relationship of the fragment to the preceding discourse is ambiguous. Second, there is an ambiguity with respect to the discourse function of an utterance, and in particular, whether it is an affirmation triggered by yes or a denial triggered by no. In the case of the denial, it needs to be established, which part of the preceding statement has to be corrected, while in the case of the affirmation, no such ambiguity arises. The interactions between these two interpretive functions may under certain circumstances render particular sentences in discourse difficult to interpret. Interpretive difficulty has the subjective flavor of ‘ungrammaticality’; in the case that we discuss here, these judgments form the basis for a particular linguistic analysis. But, we argue, manipulation of the dis-course context can simplify discourse interpretation by resolving the ambiguity, which removes the interpretive difficulty. The conclusion that we draw is that the phenomenon in question is not a matter of linguistic structure, but of discourse interpretation.
The sentiment polarity of a phrase does not only depend on the polarities of its words, but also on how these are affected by their context. Negation words (e.g. not, no, never) can change the polarity of a phrase. Similarly, verbs and other content words can also act as polarity shifters (e.g. fail, deny, alleviate). While individually more sparse, they are far more numerous. Among verbs alone, there are more than 1200 shifters. However, sentiment analysis systems barely consider polarity shifters other than negation words. A major reason for this is the scarcity of lexicons and corpora that provide information on them. We introduce a lexicon of verbal polarity shifters that covers the entirety of verbs found in WordNet. We provide a fine-grained annotation of individual word senses, as well as information for each verbal shifter on the syntactic scopes that it can affect.
We present evidence for the analysis of the vowels in English <say> and <so> as biphonemic diphthongs /ɛi/ and /əu/, based on neutralization patterns, regular alternations, and foot structure. /ɛi/ and /əu/ are hence structurally on a par with the so called “true diphthongs” /ɑi/, /ɐu/, /ɔi/, but also share prosodic organization with the monophthongs /i/ and /u/. The phonological evidence is supported by dynamic measurements based on the American English TIMIT database.
Calculations of F2-slopes proved to be especially suited to distinguish the relevant groups in accordance with their phonologically motivated prosodic organizations.
The goal of the MULI (MUltiLingual Information structure) project is to empirically analyse information structure in German and English newspaper texts. In contrast to other projects in which information structure is annotated and investigated (e.g. in the Prague Dependency Treebank, which mirrors the basic information about the topic-focus articulation of the sentence), we do not annotate theory-biased categories like topic-focus or theme-rheme. Trying to be as theory-independent as possible, we annotate those features which are relevant to information structure and on the basis of which typical patterns, co-occurrences or correlations can be determined. We distinguish between three annotation levels: syntax, discourse and prosody. The data is based on the TIGER Corpus for German and the Penn Treebank for English, since the existing information on part-of-speech and syntactic structure can be re-used for our purposes. The actual annotation of an English example sequence illustrates our choice of categories on each level. Their combination offers the possibility to investigate how information structure is realised and can be interpreted.
This paper is concerned with a novel methodology for generating phonetic questions used in tree-based state tying for speech recognition. In order to implement a speech recognition system, language-dependent knowledge which goes beyond annotated material is usually required. The approach presented here generates phonetic questions for decision trees are based on a feature table that summarizes the articulatory characteristics of each sound. On the one hand, this method allows better language-specific triphone models to be defined given only a feature-table as linguistic input. On the other hand, the feature-table approach facilitates efficient definition of triphone models for other languages since again only a feature table for this language is required. The approach is exemplified with speech recognition systems for English and Thai.
Preface
(2015)
This paper describes the lexical database tool LOLA (Linguistic-Oriented Lexical database Approach) which has been developed for the construction and maintenance of lexicons for the machine translation system LMT. First, the requirements such a tool should meet are discussed, then LMT and the lexical information it requires, and some issues concerning vocabulary acquisition are presented. Afterwards the architecture and the components of the LOLA system are described and it is shown how we tried to meet the requirements worked out earlier. Although LOLA originally has been designed and implemented for the German-English LMT prototype, it aimed from the beginning at a representation of lexical data that can be reused for other LMT or MT prototypes or even other NLP applications. A special point of discussion will therefore be the adaptability of the tool and its components as well as the reusability of the lexical data stored in the database for the lexicon development for LMT or for other applications.
CoMParS is a resource under construction in the context of the long-term project German Grammar in European Comparison (GDE) at the IDS Mannheim. The principal goal of GDE is to create a novel contrastive grammar of German against the background of other European languages. Alongside German, which is the central focus, the core languages for comparison are English, French, Hungarian and Polish, representing different typological classes. Unlike traditional contrastive grammars available for German, which usually cover language pairs and are based on formal grammatical categories, the new GDE grammar is developed in the spirit of functionalist typology. This implies that, instead of formal criteria, cognitively motivated functional domains in terms of Givón (1984) are used as tertia comparationis. The purpose of CoMParS is to document the empirical basis of the theoretical assumptions of GDE-V and to illustrate the otherwise rather abstract content of grammar books by as many as possible naturally occurring and adequately presented multilingual examples, including information on their use in specific contexts and registers. These examples come from existing parallel corpora, and our presentation will focus on the legal aspects and consequences of this choice of language data.
This article describes a series of ongoing efforts at the Stanford Literary Lab to manage a large collection of literary corpora (~40 billion words). This work is marked by a tension between two competing requirements – the corpora need to be merged together into higher-order collections that can be analyzed as units; but, at the same time, it’s also necessary to preserve granular access to the original metadata and relational organization of each individual corpus. We describe a set of data management practices that try to accommodate both of these requirements – Apache Spark is used to index data as Parquet tables on an HPC cluster at Stanford. Crucially, the approach distinguishes between what we call “canonical” and “combined” corpora, a variation on the well-established notion of a “virtual corpus” (Kupietz et al., 2014; Jakubíek et al., 2014; van Uytvanck, 2010).
This paper outlines the broad research context and rationale for a new international comparable corpus (ICC). The ICC is to be largely modelled on the text categories and their quantities the International Corpus of English with only a few changes. The corpus will initially begin with nine European languages but others may join in due course. The paper reports on those and other agreements made at the inaugural planning meeting in Prague on 22-23 June 2017. It also sets out the project’s goals for its first two years.
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
In this paper, I argue against the analyses of the there-construction by Moro (1997) and Hoekstra & Mulder (1990) and for an analysis in the frame of Williams (1994), Hazout (2004) from two angles. First of all, Moro and Hoekstra & Mulder do not correctly predict the behaviour of the there-construction under wh-movement; second, from a semantic point of view, the predicate in the small clause structure is the postverbal DP and not there. Alternatively, I follow the proposal by Williams (1994) in which there is the subject of predication and I will point out a direction to analyse the problematic wh-movement data within this framework.
We present two collections of lexical items with idiosyncratic distribution. The collections document the behavior of German and English bound words (BW, such as English “headway”), i.e., words which can only occur in one expression (“make headway”). BWs are a problem for both general and idiomatic dictionaries since it is unclear whether they have an independent lexical status and to what extent the expressions in which they occur are typical idiomatic expressions. We propose a system which allows us to document the information about BWs from dictionaries and linguistic literature, together with corpus data and example queries for major text corpora. We present our data structure and point to other phraseologically oriented collections. We will also show differences between the German and the English collection.
In this paper, an exploratory data-driven method is presented that extracts word-types from diachronic corpora that have undergone the most pronounced change in frequency of occurrence in a given period of time. Combined with statistical methods from time series analysis, the method is able to find meaningful patterns and relationships in diachronic corpora, an idea that is still uncommon in linguistics. This indicates that the approach can facilitate an improved understanding of diachronic processes.
The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; Kohl et al. 1998). In this paper, we compare the Levin classification with the work of the FrameNet project (Fillmore & Baker 2001), where words (not just verbs) are grouped according to the conceptual structures (frames) that underlie them and their combinatorial patterns are inductively derived from corpus evidence. This means that verbs grouped together in FrameNet (FN) might be semantically similar but have different (or no) alternations, and that verbs which share the same alternation might be represented in two different semantic frames.
This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
Freezing in it-clefts
(2013)
The paper contributes to the raising vs. control debate with respect to modals through (A) novel data; (B) the investigation of a domain in which it has proven particularly problematic: volitional modality. We analyze oblique arguments of experiencer verbs embedded under German wollen ‘want’ and propose that they support both generalized raising and the abandonment of the classical version of the Theta Criterion. Byproducts of the analysis include a syntactic account involved in a class of datives in the language together with the initial characterization of a related modal in German which is expressed through the same item as volition and which we term weak.
In recent years, theoretical and computational linguistics has paid much attention to linguistic items that form scales. In NLP, much research has focused on ordering adjectives by intensity (tiny < small). Here, we address the task of automatically ordering English adverbs by their intensifying or diminishing effect on adjectives (e.g. extremely small < very small). We experiment with 4 different methods: 1) using the association strength between adverbs and adjectives; 2) exploiting scalar patterns (such as not only X but Y); 3) using the metadata of product reviews; 4) clustering. The method that performs best is based on the use of metadata and ranks adverbs by their scaling factor relative to unmodified adjectives.
American English and German AI, AU observed in cognates such as Wein, wine, Haus, house are usually treated on a par, represented with the same initial vowel (cf. [ai], [au] for Am. Engl, and German [1]). Yet, acoustic measurements indicate differences as the relevant trajectories characteristically cross in Am. Engl, but not in German. These data may indicate consistency with the same initial target for these diphthongs in German, supporting the choice of the same Symbol /a/ in phonemic representation, as opposed to distinct targets (and distinct initial phonemes) in American English.
Languages vary in whether or not their future markers are compatible with non-future modal readings (Tonhauser, 2011b). The present paper proposes that this Variation is determined by the aspectual architecture of a given language, more precisely if and how aspects can be stacked. Building on recent accounts of the temporal interpretation of modals (Matthewson, 2012, 2013; Kratzer, 2012; Chen et al., ta), the paper first sketches an analysis of the temporal readings of the English future marker will and then provides cross-linguistic comparison with a selected, typologically diverse set of languages (Medumba, Hausa, Gitksan, and Greek).
This book analyses requests for action on the basis of natural video-recorded data of everyday interaction in British English and Polish families. Jorg Zinken describes in his analyses the features of interactional context that people across cultures might be sensitive to in designing a request, as well as aspects of cultural diversity.
When a noise verb is used to indicate verbal communication, factors from both the source domain of the verb (perception) and the target domain (communication) play a role in determining the argument structure of the sentence. While the target domain supplies a syntactic structure, the source domain’s semantics constrain the degree to which that syntactic structure can be exploited. This can be determined by comparing noise verbs in this use with manner-of-communication verbs, which are superficially similar, but native to communication. Data for these two classes of verbs were drawn from the British National Corpus. The data were annotated with frame-semantic markup, as described in the Berkeley FrameNet Project. We compared the presence, type of syntactic realization, and position of the semantically annotated arguments for both classes of verbs. We found that noise and manner verbs show statistically significant differences in these three areas. For instance, noise verbs are more focused on the form of the message than manner verbs: noise verbs appear more frequently with a quoted message. In addition, there are differences other than the complementation patterns: certain noise verbs are biased with respect to speakers’ genders, message types, and even orthography in quoted messages
Gaps in Word Formation
(1996)
Trubetzkoy's recognition of a delimitative function of phonology, serving to signal boundaries between morphological units, is expressed in terms of alignment constraints in Optimality Theory, where the relevant constraints require specific morphological boundaries to coincide with phonological structure (Trubetzkoy 1936, 1939, McCarthy & Prince 1993). The approach pursued in the present article is to investigate the distribution of phonological boundary signals to gain insight into the criteria underlying morphological analysis. The evidence from English and Swedish suggests that necessary and sufficient conditions for word-internal morphological analysis concern the recognizability of head constituents, which include the rightmost members of compounds and head affixes. The claim is that the stability of word-internal boundary effects in historical perspective cannot in general be sufficiently explained in terms of memorization and imitation of phonological word form. Rather, these effects indicate a morphological parsing mechanism based on the recognition of word-internal head constituents. Head affixes can be shown to contrast systematically with modifying affixes with respect to syntactic function, semantic content, and prosodic properties. That is, head affixes, which cannot be omitted, often lack inherent meaning and have relatively unmarked boundaries, which can be obscured entirely under specific phonological conditions. By contrast, modifying affixes, which can be omitted, consistently have inherent meaning and have stronger boundaries, which resist prosodic fusion in all phonological contexts. While these correlations are hardly specific to English and Swedish it remains to be investigated to which extent they hold cross-linguistically. The observation that some of the constituents identified on the basis of prosodic evidence lack inherent meaning raises the issue of compositionality. I will argue that certain systematic aspects of word meaning cannot be captured with reference to the syntagmatic level, but require reference to the paradigmatic level instead. The assumption is then that there are two dimensions of morphological analysis: syntagmatic analysis, which centers on the criteria for decomposing words in terms of labelled constituents, and paradigmatic analysis, which centers on the criteria for establishing relations among (whole) words in the mental lexicon. While meaning is intrinsically connected with paradigmatic analysis (e.g. base relations, oppositeness) it is not essential to syntagmatic analysis.
Complex common names such as Indian elephant or green tea denote a certain type of entity, viz. kinds. Moreover, those kinds are always subkinds of the kind denoted by their head noun. Establishing such subkinds is essentially the task of classifying modifiers that are a defining trait of endocentrically structured complex common names. Examining complex common names of different lexico-syntactic types(NN compounds, N+N syntagmas, NP/PP syntagmas, A+N syntagmas) and from different languages (particularly English, German and French) it can be shown that complex common names are subject to language- independent formal and semantic constraints. In particular, complex common names qualify as name-like expressions in that they tend to be deficient in terms of formal complexity and semantic compositionality.
Centering on German self-motion verbs, this paper demonstrates the advantages of free-sorting over creating and delineating word fields with more traditional methods. In particular, I draw a comparison to Snell-Hornby’s (1983) work on German descriptive verbs, which produces lexical fields with the help of dictionary entries, a thesaurus, a small corpus of written text and limited speaker feedback. While these methods have benefits, they are limited in their ability to represent the average organization of semantic fields in the mind of everyday speakers. Freesorting, by contrast, does not rely on academic resources, corpora or singular speaker judgments. In sorting, a group of informants creates visible sets of items according to perceived similarity. Psycholinguists have used the method to quantitatively explore the perception of color terms across cultures (c.f. Roberson et al. 2005). With a sufficiently large number of informants, one can generate lexical sorting data that is apt for cluster analysis, the results of which are represented by dendrograms. The experiment I conducted involved 33 school children from a middle class neighborhood in Braunschweig, Northern Germany. My experiment shows that Snell-Hornby’s (1983) representation of the self-motion field can be improved by integrating further dimensions of meaning, such as body-space relations and sound, that young speakers find salient in the grouping procedure.
In this paper, a method for measuring synchronic corpus (dis-)similarity put forward by Kilgarriff (2001) is adapted and extended to identify trends and correlated changes in diachronic text data, using the Corpus of Historical American English (Davies 2010a) and the Google Ngram Corpora (Michel et al. 2010a). This paper shows that this fully data-driven method, which extracts word types that have undergone the most pronounced change in frequency in a given period of time, is computationally very cheap and that it allows interpretations of diachronic trends that are both intuitively plausible and motivated from the perspective of information theory. Furthermore, it demonstrates that the method is able to identify correlated linguistic changes and diachronic shifts that can be linked to historical events. Finally, it can help to improve diachronic POS tagging and complement existing NLP approaches. This indicates that the approach can facilitate an improved understanding of diachronic processes in language change.
Word-formation rules differ from syntactic rules in that they, apart from obeying morphological and semantic constraints, can also be − and often are − restricted phonologically. The present article includes an overview of the relevant phenomena in English and discusses the consequences for the representation of words in the mental lexicon and for grammar.
Patterns pertaining to 'strong' DMPs and scope in presentational there-sentences (henceforth: PTSs) have received much attention, and many attempts have been made to derive them. Building on the account of Heim 1987, this paper proposes a novel account based on temporal reference encoding and general assumptions concerning the nature of the interface between the computational system of syntax (CS) and the systems of sound and meaning (Chomsky 1999).
As the nature of negative polarity items (NPIs) and their licensing contexts is still under much debate, a broad empirical basis is an important cornerstone to support further insights in this area of research. The work discussed in this paper is intended as a contribution to realizing this objective. The authors briefly introduce the phenomenon of NPIs and outline major theories about their licensing and also various licensing contexts before discussing our major topics: Firstly, a corpus-based retrieval method for NPI candidates is described that ranks the candidates according to their distributional dependence on the licensing contexts. Our method extracts single-word candidates and is extended to also capture multi-word candidates. The basic idea for automatically collecting NPI candidates from a large corpus is that an NPI behaves like a kind of collocate to its licensing contexts. Manual inspection and interpretation of the candidate lists identify the actual NPIs. Secondly, an online repository for NPIs and other items that show distributional idiosyncrasies is presented, which offers an empirical database for further (theoretical) research on these items in a sustainable way.
Should events be conceived of as primitive or should they be decomposed into more basic elements with certain syntax? This talk presents new evidence for the latter view: If events are represented as contradictory propositional meanings representing their pre- and post states, a uniform analysis of certain eventive and certain too- comparative constructions is possible; this is wanted given striking parallels between the two types of structure. The analysis goes some way, among other, toward explaining ‘repetetive/restitutive’ asymmetries familiar from eventive constructions (von Stechow 1996) but similarly arising in too- comparative constructions.
The effects of different forms of predication have been insightfully (and almost exclusively) studied for 'simple' cases of predication, of which the 'presentational sentence' is maybe the paradigm instantiation. It is the aim of this paper to show that thc same kind of effects as well as in fact the same kind of structures are present at embedded levels in thematically and otherwise more complex structures. Beyond presentational sentences, 'unaccusative' experiencing constructions involving a dative subject, 'double object constructions' and - to a lesser extent - spraylload constructions are discussed. For all of these, it is argued that they comprise a predication encoding the ascription of a transient temporal property to a location. On this basis, a proposal is made as to how the scope asymmetry between the two arguments involved in the colistructions can be explained. Furthermore, a proposal is made as to how what has been called 'argument shift' is motivated.
The principal claim of this dissertation is that there is a unique structural core shared by Double Object, Dative Experiencer and Existential/Presentational constructions. This core is argued to take the form of a Cipient Predication structure, `cipient covering traditional notions like (affected) source/goal, recipient, indirect object or dative experiencer. Central questions arising in defining Cipient Predication are: How are cipients thematically licensed, and what is the role of there in argument-structural terms? What is the structural locus of cipients/there? What is the role and nature of dative case? How can the possessive interpretation, the blocking and definiteness effects associated with the above-mentioned constructions be explained? Cipients are presented as external arguments and logical subjects (location individuals) of predicates derived from a propositional meaning embedded in the VP, the predicate formed by a lower tense head `little t that is overtly realized as there. Little t is argued to encode a distinction at the reference time level, structural dative hinging on a tense property like structural nominative. The cipient relates as a whole to a part to a VP-internal location argument that together with the theme furnishes the propositional meaning (`possession ). As logical subjects, cipients anchor the predicate to the utterance context, forcing its interpretation in extralinguistic terms (`blocking effects ). It is proposed that lacking structurally encoded subjects, Existential/Presentational constructions are not saturated expressions in syntax, precluding the interpretation of certain quantifiers (most/every, vide `definiteness effects ). Cipient Predication, couched in terms of the Minimalist Program (in particular, Chomsky 1999) and a semantics relying on tense and the ontological distinction of locations as well as scalar and part-whole structure, should be of interest to scholars working on datives, argument structure, and the syntax/semantics/pragmatics interface more generally.
The book investigates the diachronic dimension of contact-induced language change based on empirical data from Pennsylvania German (PG), a variety of German in long-term contact with English. Written data published in local print media from Pennsylvania (USA) between 1868 and 1992 are analyzed with respect to semantic changes in the argument structure of verbs, the use of impersonal constructions, word order changes in subordinate clauses and in prepositional phrase constructions.
The research objective is to trace language change based on diachronic empirical data, and to assess whether existing models of language contact make provisions to cover the long-term developments found in PG. The focus of the study is thus twofold: first, it provides a detailed analysis of selected semantic and syntactic changes in Pennsylvania German, and second, it links the empirical findings to theoretical approaches to language contact.
Previous investigations of PG have drawn a more or less static, rather than dynamic, picture of this contact variety. The present study explores how the dynamics of language contact can bring about language mixing, borrowing, and, eventually, language change, taking into account psycholinguistic processes in (the head of) the bilingual speaker.
This article is concerned with the way in which different types of speech act evaluations are lexicalized by speech act verbs and speech act idioms. The authors first distinguish different types of explicit and implicit evaluations which may be lexicalized by speech act verbs. The meanings of speech act verbs in German, English and Dutch are compared to examine which types of evaluations are lexicalized in each of these languages. Having established an inventory of evaluation types lexicalized by speech act verbs, they compare the evaluations lexicalized by speech act verbs with those lexicalized by speech act idioms. Particularly, he authors ask themselves whether certain types of evaluations may be lexicalized by idioms rather than by verbs, and if so, whether this phenomenon also holds cross- linguistically. They shall also examine whether those evaluations typically expressed by speech act idioms are the same in German, English and Dutch.
The Lemmatisation of Idioms
(2005)
The question of how idioms should be lemmatised is a fundamental issue in the lexicographic treatment of idioms and has been the focus of much debate ever since the first International Symposium on Lexicography. Several proposals for a systematic lexico-graphic treatment of idioms have been put forward (e.g. Cowie 1981, Burger 1983, Braasch 1988, Schemann 1991, Burger 1998 etc.). In this paper, we examine how semi- and non-literal idioms are lemmatised in some of the most widely-known dictionaries of German, English and Dutch. In what follows, we confine ourselves to the treatment of idioms in mono- and bilingual general dictionaries which are alphabetically ordered. Since the lexical status of idioms is relevant to the way in which idioms should be lemmatised, we shall first be concerned with the status of idioms as units of the lexicon.
Research on syntactic ambiguity resolution in language comprehension has shown that subjects' processing decisions are influenced by a variety of heterogeneous factors such as e.g., syntactic complexity, semantic fit and the discourse frequency of the competing structures. The present paper investigates a further potentially relevant factor in such processes: effects of syntagmatic lexical chunking (or matching to a complex memorized prefab) whose occurrence would be predicted from usage-based assumptions about linguistic categorisation. Focusing on the widely studied so-called DO/SC-ambiguity in which a post-verbal NP is syntactically ambiguous between a direct object and the subject of an embedded clause, potentially biasing collocational chunks of the relevant type are identified in a number of corpus-linguistic pretests and then investigated in a self-paced reading experiment. The results show a significant increase in processing difficulty from a collocationally neutral over a lexically biasing to a strongly biasing condition. This suggests that syntagmatically complex and partially schematic templates of the kind envisioned in usage-based Construction Grammar may impinge on speakers' online processing decisions during sentence comprehension.
The authors compare the use of two formats for requesting an object in informal everyday interaction: imperatives, common in our Polish data, and second-person polar questions, common in our English data. Imperatives and polar questions are selected in the same interactional “home environments” across the languages, in which they enact two social actions: drawing on shared responsibility and enlisting assistance, respectively. Speakers across the languages differ in their choice of request format in “mixed” interactional environments that support either. The finding shed light on the orderly ways in which cultural diversity is grounded in invariants of action formation.
Corpus-assisted analyses of public discourse often focus on the level of the lexicon. This article argues in favour of corpus-assisted analyses of discourse, but also in favour of conceptualising salient lexical items in public discourse in a more determined way. It draws partly on non-Anglophone academic traditions in order to promote a conceptualisation of discourse keywords, thereby highlighting how their meaning is determined by their use in discourse contexts. It also argues in favour of emphasising the cognitive and epistemic dimensions of discourse-determined semantic structures. These points will be exemplified by means of a corpus-assisted, as well as a frame-based analysis of the discourse keyword financial crisis in British newspaper articles from 2009. Collocations of financial crisis are assigned to a generic matrix frame for ‘event’ which contains slots that specify possible statements about events. By looking at which slots are more, respectively less filled with collocates of financial crisis, we will trace semantic presence as well as absence, and thereby highlight the pragmatic dimensions of lexical semantics in public discourse. The article also advocates the suitability of discourse keyword analyses for systematic contrastive analyses of public/political discourse and for lexicographical projects that could serve to extend the insights drawn from corpus-guided approaches to discourse analysis.
The authors present a multilingual electronic database of lexical items with idiosyncratic occurrence patterns. Currently, our database consists of: (1) a collection of 444 bound words in German; (2) a collection of 77 bound words in English; (3) a collection of 58 negative polarity items in Romanian; (4) a collection of 84 negative polarity items in German; and (5) a collection of 52 positive polarity items in German. The database is encoded in XML and is available via the Internet, offering dynamic and flexible access.
Is it possible to undo or reverse language attrition? In other words, has there been, in the case of attrition, a permanent change with respect to the speaker's L1 knowledge, or do we only see temporary effects on the control of that knowledge? It is proposed here that the concept of attrition should include the temporary loss of language skills since it is, so far, not clear whether or to what extent once-acquired linguistic abilities can be permanently lost at all, particularly with respect to an L1. A reversal in the development of attrition after renewed contact with the L1 can support the claim that a decrease in L1 proficiency can be TEMPORARY, and that it is the ACCESSIBILITY of items and structures that is affected by attrition rather than the L1 knowledge (competence) itself. Our primary research interest in the present study is to analyze what skills and features are recoverable and what phenomena persist, (possibly) indicating permanent loss.
Psychological research has emphasized the importance of narrative for a person’s sense of self. Building a coherent narrative of past events is one objective of psychotherapy. However, in guided self-help therapy the patient has to develop this narrative autonomously. Identifying patients’ narrative skills in relation to psychological distress could provide useful information about their suitability for self-help. The aim of this study was to explore whether the syntactic integration of clauses into narrative in texts written by prospective psychotherapy patients was related to mild to moderate psychological distress. Cross-clausal syntax of texts by 97 people who had contacted a primary care mental health service was analyzed. Severity of symptoms associated with mental health difficulties was assessed by a standardized scale (Clinical Outcomes in Routine Evaluation outcome measure). Cross-clausal syntactic integration was negatively correlated with the severity of symptoms. A multiple regression analysis confirmed that the use of simple sentences, finite complement clauses, and coordinated clauses was associated with symptoms (R2 = .26). The results suggest that the analysis of cross-clausal syntax can provide information on patients’ narrative skills in relation to distressing events and can therefore provide additional information to support treatment decisions.
Language resources are often compiled for the purpose of variational analysis, such as studying differences between genres, registers, and disciplines, regional and diachronic variation, influence of gender, cultural context, etc. Often the sheer number of potentially interesting contrastive pairs can get overwhelming due to the combinatorial explosion of possible combinations. In this paper, we present an approach that combines well understood techniques for visualization heatmaps and word clouds with intuitive paradigms for exploration drill down and side by side comparison to facilitate the analysis of language variation in such highly combinatorial situations. Heatmaps assist in analyzing the overall pattern of variation in a corpus, and word clouds allow for inspecting variation at the level of words.
This paper presents ongoing work on a multilingual (English, French, German) lexical resource of soccer language. The first part describes how lexicographic descriptions based on frame-semantic principles are derived from a partially aligned multilingual corpus of soccer match reports. The remainder of the paper then discusses how different types of ontological knowledge are linked to this resource in order to provide an access structure to the resulting dictionary. It is argued that linking lexical resources and ontologies in such a way provides novel ways to a dictionary user of navigating a domain vocabulary
Language attitudes may be differentiated into attitudes towards speakers and attitudes towards languages. However, to date, no systematic and differentiated instrument exists that measures attitudes towards language. Accordingly, we developed, validated, and applied the Attitudes Towards Languages (AToL) scale in four studies. In Study 1, we selected 15 items for the AToL scale, which represented the three dimensions of value, sound, and structure. The following studies replicated and validated the three-factor structure and differential mean profiles along the three dimensions for different languages (a) in a more diverse German sample (Study 2), (b) in different countries (Study 3), and (c) when participants based their evaluations on speech samples (Study 4). Moreover, we investigated the relation between the AToL dimensions and stereotypic speaker evaluations. Results confirm the reliability, validity, and generalizability of the AToL scale and its incremental value to mere speaker evaluations.
In this paper, I argue that the main questions that arise in the process of making a dictionary of political metaphors - that of identifying live conceptual metaphors in a corpus of text - may be solved on the basis of a pragmatic approach, taking into account the reflections in a text of cognitive processes in the minds of its author and its reader. Certainly, this goal cannot be attained without a further fine-grained semantic analysis o f presumably metaphoric expressions in their linguistic and cultural context.