Refine
Year of publication
Document Type
- Part of a Book (1761) (remove)
Language
- German (1465)
- English (268)
- Multiple languages (9)
- French (6)
- Spanish (6)
- Portuguese (2)
- Russian (2)
- Italian (1)
- Dutch (1)
- Polish (1)
Keywords
- Deutsch (735)
- Korpus <Linguistik> (195)
- Wörterbuch (99)
- Konversationsanalyse (85)
- Gesprochene Sprache (70)
- Lexikographie (67)
- Grammatik (65)
- Sprachvariante (63)
- Kommunikation (55)
- Neologismus (53)
Publicationstate
- Veröffentlichungsversion (1761) (remove)
Reviewstate
- (Verlags)-Lektorat (1481)
- Peer-Review (193)
- Verlags-Lektorat (60)
- Peer-review (8)
- (Verlags-)Lektorat (4)
- Verlagslektorat (3)
- Peer Review (2)
- (Verlags)Lektorat (1)
- (Verlags-) Lektorat (1)
Publisher
- de Gruyter (520)
- Narr (182)
- Lang (97)
- IDS-Verlag (86)
- Niemeyer (84)
- Institut für Deutsche Sprache (66)
- De Gruyter (50)
- Stauffenburg (37)
- Westdeutscher Verlag (26)
- Heidelberg University Publishing (24)
A central goal of linguistics is to understand the diverse ways in which human language can be organized (Gibson et al. 2019; Lupyan/Dale 2016). In our contribution, we present results of a large scale cross-linguistic analysis of the statistical structure of written language (Koplenig/Wolfer/Meyer 2023) we approach this question from an information-theoretic perspective. To this end, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections, so-called corpora, consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of un. To this end, we have trained a language model on more than 6,500 different documents as represented in 41 parallel/multilingual corpora consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Figure 1 shows that our database covers a large variety of different text types, e.g. religious texts, legalese texts, subtitles for various movies and talks, newspaper texts, web crawls, Wikipedia articles, or translated example sentences from a free collaborative online database. Furthermore, we use word frequency information from the Crúbadán project that aims at creating text corpora for a large number of (especially under-resourced) languages (Scannell 2007). We statistically infer the entropy rate of each language model as an information-theoretic index of (un)predictability/complexity (Schürmann/Grassberger 1996; Takahira/Tanaka-Ishii/Dębowski 2016). Equipped with this database and information-theoretic estimation framework, we first evaluate the so-called ‘equi-complexity hypothesis’, the idea that all languages are equally complex (Sampson 2009). We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. This constitutes evidence against the equi-complexity hypothesis from an information-theoretic perspective. We then present, discuss and evaluate evidence for a complexity-efficiency trade-off that unexpectedly emerged when we analysed our database: high-entropy languages tend to need fewer symbols to encode messages and vice versa. Given that, from an information theoretic point of view, the message length quantifies efficiency – the shorter the encoded message the higher the efficiency (Gibson et al. 2019) – this indicates that human languages trade off efficiency against complexity. More explicitly, a higher average amount of choice/uncertainty per produced/received symbol is compensated by a shorter average message length. Finally, we present results that could point toward the idea that the absolute amount of information in parallel texts is invariant across different languages.
In G, E, I, and H there are constructions with accusative NPs being the external argument of an infinitival, (1) to (4). In P these accusative NPs can only co-occur with an adjectival participle, (5), a construction also occurring in E, (6). The talk compares the syntactic and semantic structure of these constructions focussing on the syntactic category of the nonfinite clause, the status of the accusative NP, the status of the infinitive, restructuring effects, and embedding predicates (including aspect).
i. As to G, E, I, and H, the infinitival clause is regarded as a TP, i.e., a small clause. Its accusative NP and infinitival predicate form a unit – [4], [12], [8]. The AcI denotes, according to [4], an eventuality, which prevents it from being negated. Its subject is case marked by the matrix predicate, either by ECM or subject-to-object raising – [9] and [10]. AcI-constructions can show clause union effects, (7). H additionally allows Dative subjects in infinitive clauses, the latter only being licensed by impersonal predicates and co-occurring with an agreeing infinitive, (8a), – [3]. In case there is no agreeing infinitive, the Dative NP is the experiencer of the matrix clause, (8b). As for Italian, it allows Nominative subject NPs in the infinitive clause, (9a, b).
ii. As to P, small clause constructions differ structurally from E, G, I and H ones – [6], [7]. P small clauses are realizable by copula constructions with verbal być ‘be’ pronominal to ‘it’, (10), or “dual” copula elements, (cooccurrence of a pronominal and a verbal element, [1]), varying with respect to selectional restrictions (part of speech or case within complement phrases, extraction possibilities, [1]). The P counterpart to the AcI-constructions is the secondary predication over an accusative object via an adjectival present participle, (5), (11) and (12). The adjectival participle construction is systematically paraphrasable via clauses introduced by jak ‘how’ (11’) and (12’). In Polish, adjectival phrases like recytującego wiersz ‘reciting’, (11), and wracającego z podróży ‘returning’, (12), clearly function as adjuncts of the accusative object go ‘him’. In our talk, we will compare this P view to languages with typical AcI-constructions, where the AcI-clause is standardly analyzed as a complement of a matrix verb.
Interactants who encounter co-participant conduct which they find to be socio-normatively problematic or troublesome are faced with a range of choices. First and foremost, this includes the issue of whether to directly address it, or to simply ‘let it pass’ (at least for now) (Emerson/Messinger 1977). In the case of the former, the issue then becomes how to address it. Across the various ways in which participants can pragmatically engage with what they perceive to be transgressive or untoward behavior (e.g., Pomerantz 1978; Schegloff 1988b; Dersley/Wootton 2000; Günthner 2000; Bolden/Robinson 2011; Potter/Hepburn 2020; see also Rodriguez 2022), they sometimes meta-pragmatically formulate the co-participant’s doings in terms of specific actions. Such action descriptions are necessarily selective (Sacks 1963; Schegloff 1972, 1988a; Sidnell/Barnes 2013): They foreground certain aspects of the co-participant’s conduct, while backgrounding others, and thus contribute to publically construeing the formulated conduct in particular ways (Jayyusi 1993), viz. as socio-normatively problematic, transgressive or untoward, and interactionally accountable (Robinson 2016; Sidnell 2017).
This conversation analytic study compares the use of negation particles in spoken German and Persian, namely nein/nee and na. While these particles have a range of functions in both languages (Ghaderi 2022; Imo 2017), their use in response to news remains understudied. We focus on nein/nee and na in two sequential contexts: (i) after prior disconfirmations (Extract (a)) and (ii) in response to either solicited or unsolicited informings (see Extracts (b) and (c), respectively). In both contexts, nein/nee and na mark unexpectedness and open up an opportunity space for more, but they do so in different ways and with different outcomes. Nein/nee- and na-turns after disconfirming, often minimal responses to first-position confirmable turns mark the prior as unexpected (or even contrasting with the nein/nee/na-speaker’s expectations) and thus as expandable/accountable (cf. Ford 2001; Gubina/Betz 2021). Nein/nee/na-turns after informings (e.g., announcements that display a story teller’s negative emotional stance) differ not only in sequential position but also in prosodic realization. They can be either falling or rising, but all are characterized by marked prosody, i.e., lengthening, very low onset, smiling or breathy voice, or high overall pitch. Through position and turn design features, such nein/nee- and na-turns not only mark a prior turn as counter to (normative) expectations, but may also display the speaker’s affective stance and affiliate with the affective stance of the prior interactant. By comparing the use of nein/nee and na in German and Persian in the two functions illustrated in Extracts (a) and (b/c), we will show (i) how nein/nee- and na-turns shape interactional trajectories after responsive actions and (ii) what role the particles play in managing news and stance-taking as well as epistemic and affective positioning. Apart from revealing similarities in the use of German and Persian negation particles, the results of our crosslinguistic comparison will demonstrate that even if different languages have similar practices for specific actions, the use of these practices is language- and culture-specific. This means that even similar practices in different languages have their own “collateral effects” (Sidnell/Enfield 2012), linguistic and prosodic characteristic features, and, at least sometimes, consequences for social actions accomplished in the specific language (e.g., Dingemanse/Blythe/Dirksmeyer 2014; Evans/Levinson 2009; Floyd/Rossi/Enfield (eds.) 2020; Fox et al. 2009). Our study uses the method of Conversation Analysis (Sidnell/Stivers (eds.) 2013) and draws on more than 80 hours of audio and video recordings of spontaneous interactions (co-present, via video link, and on the telephone) in everyday and institutional contexts.
The issue: We discuss (declarative) prepositional object clauses (PO-clauses) in the West Germanic languages Dutch (NL), German (DE), and English (EN). In Dutch and German, PO-clauses occur with a prepositional proform (=PPF, Dutch: ervan, erover, etc.; German: drauf/darauf, drüber/darüber, etc.). This proform is optional with some verbs (1). In English, by contrast, P embeds a clausal complement in the case of gerunds or indirect questions (2), however, P is obligatorily absent when the embedded CP is a that-clause in its base positionv(3a). However, when the that-clause is passivized or topicalized, the stranded P is obligatory (3b). Given this scenario, we will address the following questions: i) Are there structural differences between PO-clauses with a P/PPF and those in which the P/PPF is optionally or obligatorily omitted? ii) In particular, do PO-clauses without P/PPF structurally coincide with direct object (=DO) clauses? iii) To what extent are case and nominal properties of clauses relevant? We use wh-extraction as a relevant test for such differences.
Previous research: Based on pronominalization and topicalization data in German and Dutch, PO-clauses are different from DO-clauses independent of the presence of the PPF (see, e.g., Breindl 1989; Zifonun/Hoffmann/Strecker 1997; Berman 2003; Broekhuis/Corver 2015 and references therein) (4,5). English pronominalization and topicalization data (3b) appear to point in the same direction (Fischer 1997; Berman 2003; Delicado Cantero 2013). However, the obligatory absence of P before that-clauses in base position indicates a convergence with DO-clauses.
Experimental evidence: To provide further evidence to these questions we tested PO-clauses in all three languages for long wh-extraction, which is usually possible for DO-clauses in English and Dutch, and in German for southern regional varieties. For German and Dutch we conducted rating studies using the thermometer method (Featherston 2008). Each study contained two sets of sentences: the first set tested long wh-extraction with regular DO-clauses (6). The second set tested wh-extraction from PO-clauses with and without PPFs (7), respectively. The results show no significant difference in extraction with PO-clauses whether or not the PPF was present even for those speakers who otherwise accept long-distance extraction in German. This supports a uniform analysis of PO-clauses with and without the PPF in contrast to DO-clauses. For English we tested extraction with verbs that select for PP-objects in two configurations: V+that-clause and V+P-gerund (8) in comparison to sentences without extraction. Participants rated sentences on a scale of 1 (unnatural) to 7 (natural). We included the gerund for English as this is a regular alternative for such objects. The results show that extraction is licit in both configurations. This suggests that English PO-clauses are different from German and Dutch PO-clauses: They rather behave as DO-clauses allowing for extraction. Note though, that the availability of extraction from P+gerund also shows that PPs are not islands for extraction in English. Overall, this shows that there is a split between English vs. German/Dutch PO-clauses when the P/PPF is absent. While these clauses behave like PO-clauses in the latter languages, extraction does not show a difference between DO- and PO-clauses in English. We will discuss the results in relation to the questions i)–iii) above.
Any bilingual dictionary is contrastive by nature, as it documents linguistic information between language pairs. However, the design and compilation of most bilingual dictionaries is often no more than mere lists of lexical or semantic equivalents. In internet forums, one can observe a huge interest in acquiring relevant knowledge about specific lexical items or pairs that are prone to comparison in a more comprehensive manner as they may pose lexical semantic challenges. In particular, these often concern easily confused pairs (e.g. false friends or paronyms) and new terms increasingly travelling between languages in news and social media (Šetka-Čilić/Ilić Plauc 2021). With regard to English and German, the fundamental comparative principles upon which contrastive guides should be build are either absent, or specialised contrastive dictionaries simply do not exist, e.g. comprehensive descriptive resources for false friends, paronyms, protologisms or neologisms (see Gouws/Prinsloo/de Schryver 2004). As a result, users turn to electronic resources such as Google translate, blogs and language forums for help. For example, it is English words such as muscular which have two German translations options.
These are two confusables muskulär and muskulös both of which exhibit a different semantic profile. German sensitiv/sensibel and their English formal counterparts sensitive/sensible are false friends. However, these terms are highly polysemous in both languages and have semantic features in common. Their full meaning spectrum is hardly captured in bilingual dictionaries to allow for a full comparison. Translating protologisms such as German Doppelwumms as well as more established new words is one of the most challenging problems. Currently, German neologisms such as Klimakleber are translated as climate glue (instead of climate activist glueing him-/herself onto objects) by online tools, simply causing mistakes and contextual distortion. Most challenges users face today are well-known (e.g. Rets 2016). New terms are often unregistered in dictionaries and it is often impossible to make appropriate choices between two or more (commonly misused) words between two languages (e.g. Benzehra 2007). These are all relevant problems to translators and language learners alike (e.g González Ribao 2019).
This paper calls for the implication of insights from contrastive lexicology into modern bilingual lexicography. To turn dictionaries into valuable resources and in order to create productive strategies in a learning environment, the practice of writing dictionaries requires a critical re-assessment. Furthermore, the full potential of electronic contrastive resources needs to be recognised and put into practice. After all, monolingual German lexicography has started to reflect on how users’ needs can be accounted for in specific comparative linguistic situations. Some of these ideas can be comfortably extended to bilingual reference guides. On the one hand, this paper will deliver a critical account of some English-German/German-English dictionaries and touch on the shortcomings of contemporary bilingual lexicography. On the other hand, with the help of fictitious resources I will demonstrate contrastive structures as focal points of consultations which answer some of the more frequent language questions more reliably. Among others, I will explain how we need to build user-friendly dictionaries to allow for translating false friends or easily confusable words from the source language into its target language efficiently. With regard to neologisms, I will show how discursive descriptions and definitions that are more elaborate can support language learners to learn about necessary extra-linguistic knowledge. Overall, this could improve the role of specialised dictionaries in the teaching or translating process (cf. Miliç/Sadri/Glušac 2019).
The International Comparable Corpus (ICC) (Kirk/Čermáková 2017; Čermáková et al. 2021) is an open initiative which aims to improve the empirical basis for contrastive linguistics by compiling comparable corpora for many languages and making them as freely available as possible as well as providing tools with which they can easily be queried and analysed. In this contribution we present the first release of written language parts of the ICC which includes corpora for Chinese, Czech, English, German, Irish (partly), and Norwegian. Each of the released corpora contains 400k words distributed over 14 different text categories according to the ICC specifications. Our poster covers the design basics of the ICC, its TEI encoding, a demonstration of using the ICC via different query tools, and an outlook on future plans.
Similar to the European Reference Corpus EuReCo (Kupietz et al. 2020), ICC follows the approach of reusing existing linguistic resources wherever possible in order to cover as many languages as possible with realistic effort in as short a time as possible. In contrast to EuReCo, however, comparable corpus pairs are not defined dynamically in the usage phase, but the compositions of the corpora are fixed in the ICC design. The approaches are thus complementary in this respect. The design principles and composition of the ICC are based on those of the International Corpus of English (ICE) (Greenbaum (ed.) 1996), with the deviation that the ICC includes the additional text category blog post and excludes spoken legal texts (see Čermáková et al. 2021 for details). ICC’s fixed-design approach has the advantage that all single-language corpora in the ICC have the same composition with respect to the selected text types and that this guarantees that the selected broad spectrum of potential influencing variables for linguistic variation is always represented. The disadvantage, however, is that this can only be achieved for quite small corpora and that the generalisability of comparative findings based on the ICC corpora will often need to be checked on larger monolingual corpora or translation corpora (Čermáková/Ebeling/Oksefjell Ebeling forthcoming). Arguing that such issues with comparability and representativeness are inevitable, in one way or the other, and need to be dealt with, our poster will discuss and exemplify the text selections in more detail.
In this presentation I show first results from an ongoing study about syntactic complexity of sanctioning turns in spoken language. This study is part of a larger project on sanctioning of misconduct in social interaction in different European languages (English, German, Italian and Polish). For the study I use video recordings of different everyday settings (family breakfasts, board game interactions and car rides) with three or four participants. These data come from the Parallel European Corpus of Informal Interaction (Kornfeld/Küttner/Zinken 2023; Küttner et al. submitted). I focus on sanctioning turns with more than one turn-constructional unit (see among others for TCUs: Sacks/Schegloff/Jefferson 1974; Clayman 2013). The study asks how often TCUs are linked to each other in the different languages, for what function, and how language diversity enters into this. Note that complex sanctioning turns do not always come as complex sentences.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
Das nationalsozialistische Mobilisierungsregime war darauf angelegt, Zeitgenoss:innen zu Positionierungshandlungen zu bewegen: vom allfälligen ‚Hitlergruß‘ über die Mitwirkung bei Parteiorganisationen oder Spendensammlungen bis hin zur Anleitung zur Selbstreflexion in Tagebüchern nationalsozialistischer Schulungslager. Allerdings sollte eine solche Aufforderung zur affirmativen Positionierung nicht allein als Zwang verstanden werden, denn dies würde ausblenden, dass viele Zeitgenoss:innen tatsächlich Anhänger:innen des Nationalsozialismus waren oder dem nationalsozialistischen Gesellschaftsprojekt zumindest nicht grundsätzlich oder in allen Punkten ablehnend gegenüberstanden. Demzufolge scheint es treffend, eine je nach Kommunikationssituation und Akteursposition variierende Mischung aus Positionierungsdruck und -bedürfnis für den hier untersuchten historischen Kontext anzunehmen.