Refine
Year of publication
Document Type
- Part of a Book (235) (remove)
Language
- English (174)
- German (58)
- French (2)
- Multiple languages (1)
Has Fulltext
- yes (235)
Keywords
- Deutsch (71)
- Korpus <Linguistik> (55)
- Wörterbuch (21)
- Englisch (17)
- Lexikographie (16)
- Gesprochene Sprache (14)
- Konversationsanalyse (14)
- Annotation (13)
- Neologismus (13)
- Kontrastive Linguistik (11)
Publicationstate
- Veröffentlichungsversion (193)
- Zweitveröffentlichung (34)
- Postprint (13)
Reviewstate
- Peer-Review (235) (remove)
Publisher
- IDS-Verlag (85)
- Peter Lang (14)
- European language resources association (ELRA) (11)
- The Association for Computational Linguistics (7)
- Znanstvena založba Filozofske fakultete Univerze v Ljubljani / Ljubljana University Press, Faculty of Arts (7)
- Benjamins (6)
- European Language Resources Association (6)
- De Gruyter (5)
- Ids-Verlag (5)
- Springer (5)
We report results from an exploratory study of college students’ conceptions of poetry in which we asked them to name three things they expect from a poem. Frequency- and list-based analyses of their responses revealed that they primarily expect poems to rhyme, but they also identified a number of form-, content-, and reception-related genre expectations, which we discuss in relation to relevant previous research. We propose that rhyme’s predominance in college students’ genre expectations reflects its perceptual and cognitive salience during incremental poetry comprehension rather than its frequency in contemporary poetic practice. Our results characterize the genre conceptions of the population that empirical studies of poetry comprehension typically investigate, and thus provide relevant background information for the interpretation of empirical
findings in this field.
This presentation deals with collaborative turn-sequences (Lerner 2004), a syntactically coherent unit of talk that is jointly formulated by at least two speakers, in Czech and German everyday conversations. Based on conversation analysis (e.g., Schegloff 2007) and a multimodal approach to social interaction (e.g., Deppermann/Streeck 2018), we aim at comparing recurrent patterns and action types within co-constructional sequences in both languages. The practice of co-constructing turns-at-talk has been described for typologically different languages, especially for English (e.g., Lerner 1996, 2004), but also for languages such as Japanese (Hayashi 2003) or Finnish (Helasvuo 2004). For German, various forms and functions of co-constructions have already been investigated (e.g., Brenning 2015); for Czech, a detailed, interactionally based description is still pending (but see some initial observations in, e.g., Hoffmannová/Homoláč/Mrázková (eds.) 2019). Although the existence of co-constructions in different languages points to a cross-linguistic conversational practice, few explicitly comparative studies exist (see, e.g., Lerner/Takagi 1999, for English and Japanese). The language pair Czech-German has mainly been studied with respect to language contact and without specifically considering spoken language or complex conversational sequences (e.g., Nekula/Šichová/Valdrová 2013). Therefore, our second aim is to sketch out a first comparison of co-constructional sequences in German and Czech, thereby contributing to the growing field of comparative and cross-linguistic studies within conversation analysis (e.g., Betz et al. (eds.) 2021; Dingemanse/Enfield 2015; Sidnell (ed.) 2009). More specifically, we will present three main sequential patterns of co-constructional sequences, focusing on the type of action a second speaker carries out by completing a first speaker’s possibly incomplete turn-at-talk, and on how the initial speaker then responds to
this suggested completion (Lerner 2004). Excerpts from video recordings of Czech and German ordinary conversations will illustrate these recurrent co-constructional sequence types, i.e., offering help during word searches (see example 1 above), displaying understanding, or claiming independent knowledge. The third objective of this paper is to underline the participants’ orientation to similar interactional problems, solved by specific syntactic and/or lexical formats in Czech and German. Considering the more recent focus on the embodied dimension of co-constructional practices (e.g., Dressel 2020), we will also investigate the multimodal formatting of a started utterance as more or less “permeable” (Lerner 1996) for co-participant completion, the participants’ mutual embodied orientation, and possible embodied responses to others’ turn-completions (such as head nods or eyebrow flashes, cf. De Stefani 2021). More generally, this contribution reflects on the possibilities and challenges of a cross-linguistic comparison of complex multimodal sequences.
Tilo Weber betont in seinem Beitrag die semantische Relevanz von Ereignissen bzw. Ereigniswissen, die er als eine besondere Form von sog. Frames betrachtet. Letztere lassen sich als heterogene und komplexe Wissensrahmen begreifen, die für lexikalische Einheiten und insbesondere für Verben eine besondere Bedeutung haben. Das kognitiv-funktionalistische Frame-Konzept erlaubt zudem, durchaus im Sinne der Tagung, einen interdisziplinären Zugang, insofern es, wie Weber meint, „ein Bindeglied zwischen Sprach-, Literatur- und Kulturwissenschaften sein kann.“
Tense, aspect, and mood are grammatical categories concerned with different notional facets of the event or situation conveyed by a given clause. They are prototypically expressed by the verbal system. Tense can be defined as a category that relates points or intervals in time to one another; in a most basic model, those include the time of the event or situation referred to and the speech time. The former may precede the latter (“past”), follow it (“future”), or be simultaneous with it (or at least overlap with it; “present”). Aspect is concerned with the internal temporal constituency of the event or situation, which may be viewed as a single whole (“perfective”) or with particular reference to its internal structure (“imperfective”), including its being ongoing at a certain point in time (“progressive”). Mood, in a narrow, morphological sense, refers to the inflectional realization of modality, with modality encompassing a large and varying set of sub-concepts such as possibility, necessity, probability, obligation, permission, ability, and volition. In the domain of tense, all Germanic languages make a distinction between non-past and past. In most languages, the opposition can be expressed inflectionally, namely, by the present and preterite (indicative). All modern languages also have a periphrastic perfect as well as periphrastic forms that can be used to refer to future events. Aspect is characteristically absent as a morphological category across the entire family, but most, if not all, modern languages have periphrastic forms for the expression of aspectual categories such as progressiveness. Regarding mood, Germanic languages are commonly described as distinguishing up to three such form paradigms, namely, indicative, imperative, and a third one referred to here as subjunctive. Morphologically distinct subjunctive forms are, however, more typical of earlier stages of Germanic than they are of most present-day languages.
‘Can’ and ‘must’-type modal verbs in the direct sanctioning of misconduct across European languages
(2023)
Deontic meanings of obligation and permissibility have mostly been studied in relation to modal verbs, even though researchers are aware that such meanings can be conveyed in other ways (consider, for example, the contributions to Nuyts/van der Auwera (eds.) 2016). This presentation reports on an ongoing project that examines deontic meaning but takes as its starting point not a type of linguistic structure but a particular kind of social moment that presumably attracts deontic talk: The management of potentially ‚unacceptable‘ or untoward actions (taking the last bread roll at breakfast, making a disallowed move during a board game, etc.). Data come from a multi-language parallel video corpus of everyday social interaction in English, German, Italian, and Polish. Here, we focus on moments in which one person sanctions another’s behavior as unacceptable. Using interactional-linguistic methods (Couper-Kuhlen/Selting 2018), we examine similarities and differences across these four languages in the use of modal verbs as part of such sanctioning attempts. First results suggest that modal verbs are not as common in the sanctioning of misconduct as one might expect. Across the four languages, only between 10%–20% of relevant sequences involve a modal verb. Most of the time, in this context, speakers achieve deontic meaning in other ways (e.g., infinitives such as German nicht so schmatzen, ‚no smacking‘). This raises the question what exactly modal verbs, on those relatively rare occasions when they are used, contribute to the accomplishment of deontic meaning. The reported study pursues this question in two ways: 1) By considering similarities across languages in the ways that modal verbs interact with other (verbal) means in the sanctioning of misconduct.; 2) By considering differences across languages in the use of modal verbs. Here, we find that the relevant modal verbs are used similarly in some activity contexts (enforcing rules during board games), but less so in other activity contexts (mundane situations with no codified rules). In sum, the presented study adds to cross-linguistically grounded knowledge about deontic meaning and its relationships to linguistics structures.
It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
Our everyday lives in any social community are shaped by rules (e.g., Roughley 2019; Schmidt/Rakoczy 2019). Rules (in a broad sense) are interactionally negotiated, monitored, enforced, and serve as an ‘orientation value‘ in social life. If someone‘s behavior is treated as norm-violating or problematic in certain way, it may be therefore confronted. Confronting interlocutors can immediately stop, modify, or retrospectively reprimand the misconduct of others in a moralizing manner. Such confrontations of a problem behavior occur commonly in informal interactions. On the basis of our corpus, specifically in informal interactions at the table, I observed that, for example, in Polish, German and British English, direct confrontations occur on average at least once every three minutes. Participants design these actions in a variety of ways, but like everything in interaction, the design is not arbitrary (Sacks 1984; Enfield/Sidnell 2019). A recurrent feature of such turns is connecting misconduct to some more general concepts. It is evident from the data that e.g. speakers of German and Polish use ‘generally valid statements’ in problematic moments (cf. Küttner/Vatanen/Zinken 2022) to reach the closure of the problem sequence, also specifically dealing there with distribution of deontic and epistemic rights (Rogowska in prep.). I ask, when and for what purpose generality, that is, abstracting from a concrete behaviour, is used as a tool while confronting others. The focus is on sequential and linguistic features of abstracting in confronting moments in language comparison. What are the methods to achieve abstraction: i) defocusing the confronted, specific agent (cf. Zinken et al. 2021; Siewierska 2008), e.g. nur derjenige der dran ist der darf die bedingungen für den handel stellen (only the one whose turn it is may set the conditions for the trade); using ii) extreme case formulations (Pomerantz 1986), e.g. na siostrę zawsze można liczyć (you can always count on a sister); iii) referring to stable character traits, e.g. Matylda bardzo chetne by podala. (.) Ona jest taka skora do pomocy (Matylda would be very happy to pass (it to you). (.) She is so eager to help); or iv) broader categorizing of the given referent, e.g. do not build (.) do do not build do not build swastikas (when a) German guy is filming us? Sometimes, even several locus of abstraction are combined in the same turn. Can we identify language-specific and cross-linguistic patterns? What are the interactional consequences: enforcing a compliant behavior in the future, eliciting an apology or cognitively simplifying complex problems? From a comparative perspective, I ask whether going beyond the here-and-now while confronting others is a practice that unites speakers across languages and is thus a human cognitive strategy to display normativity. This ongoing study is based on new comparable data from four European languages from informal interaction during activities around the table (Kornfeld/Küttner/Zinken 2023; Küttner et al. in prep.). The phenomenon was coded systematically in each of the four languages as part of a larger, quantitatively oriented study with different questions (Küttner et al. submitted). In the talk, I will show exemplarily Polish and German evidence. I use the methods of Conversation Analysis (Sidnell/Stivers (eds.) 2012) and Interactional Linguistics (Imo/Lanwer 2019).
It is a ubiquitous phenomenon of everyday interaction that participants confront their co-participants for behaviour that they assess as undesirable or in some other way untoward. In a set of video data of informal interaction from the PECII corpus (Parallel European Corpus of Informal Interaction), cases of such sanctions have been collected in English, German, Italian and Polish data. This study presents work in progress and focuses on interrogatively formatted sanctions, in particular on non-polar interrogatives. It has already been shown that interrogatives can do much more than ask questions (Huddleston 1994). They can also function as directives (Lindström et al. 2017) or, more specifically, as requests (Curl/Drew 2008), as invitations (Margutti/Galatolo 2018) or reproaches (Klattenberg 2021), among others. What makes them interesting for cross-linguistic comparison is that the four languages that are considered provide different morphological and (morpho-)syntactical ressources for the realization of interrogative phrases. For example, German provides the option of building in the modal particle denn that reveals a previous lack of clarity and obliges the co-participant(s) to deliver the missing information (Deppermann 2009). Of course, the other three languages have modal particles, too (e.g. allora in Italian or though in English), but they do not seem to convey the same semantic and interactional qualities as denn. From an interactional point of view, one could think that interrogatives are a typical and effective way of solliciting accounts, since formally they open up a conditionally relevant space for an answer or a
reaction. But as the data shows, this does not guarantee that they are actually responded to. Another relevant aspect in the context of sanctions is that the interrogative format seems to carry a certain ‚openness‘ that might be seen as a mitigating effect and thus provides an interesting point of comparison with other mitigating devices. This study uses the methods of conversation analysis and interactional linguistics. It is based on a collection of 148 interrogative sanctions (out of which 84 are non-polar interrogatives) covering the four languages. I draw on coded data from roughly 1000 cases to get a first overall idea of how the interrogative format might differ from other formats, and how it might interrelate with specific features – for example, if subsequently an account is delivered. Going more into depth, the interrogative sanctions will then be analyzed with respect to their formal design (e.g. polar questions vs. content questions vs. tag questions, Rossano 2010; Hayano 2013) and to their pragmatic implications. I also analyze reactions to such sanctions – both formally (cf. Enfield et al. 2019, 279) and, again, from an interactional perspective (e.g. acceptance/compliance vs. challenging/defiance; Kent 2012; Cekaite 2020). A more detailed zooming in on the sequential unfolding of some particularly interesting
instances of sanctioning interrogatives will make the picture complete.
Contrastive analysis of climate-related neologisms registered in GermanN and French Wikipedia
(2023)
Neologisms represent new social norms, tendencies, controversies and attitudes. They denote new or changed concepts which are constantly being negotiated between different members of the discourse community (Wodak 2022 and Catalano/Waugh (eds.) 2020). Neologisms help to identify new communicative patterns and narratives which illustrate different strings of discourse in everyday life. In recent years, many neologisms relating to the subject of the environment and climate have been emerging around the world mainly due to dominant discussions on climate change and the movement “Fridays for Future”. In German, for example, neologisms such as Klimakleber, klimaresilient and globaler Streik and in French neologisms such as éco-anxiété, justice climatique and écocitoyen could be observed. These neologisms occur in many domains of life, for example in politics, media and also in advertising, which means that “l’importance croissante des enjeux environnementaux dans les discours politiques, médiatiques et publicitaires” (Balnat/Gérard 2022, p. 22) can be identified. However, it is not only the occurrence of environment- or climate-related topics that is increasing, but also the rising polarisation of the public debate. The polarisation within public discourse is based on the fact that there are opposing positions which are represented by new or recently relevant terms such as activistes du climat (or Klimaaktivisten) and climatosceptiques (or Klimaskeptiker) (Balnat/Gérard 2022, p. 22). Due to different identifications with one or the other side, one can also speak of an “affrontement idéologique” (Balnat/Gérard 2022, p. 23). 1 The explosive nature and the high complexity of the debate on climate and the environmental issues mean that many words are naturally unfamiliar to people. This is especially true with regard to neologisms. In addition, it is often not only the new word itself but also the signified concept that is initially unknown. When people then look up words, they often do so on the Internet. Wikipedia as a “free encyclopedia” (Wikipedia 2023) is particularly well suited as an object of study with regard to neologisms, since factual knowledge is given special attention there. Furthermore, this reference guide is perceived as a regular source of agreed and common knowledge on all sorts of subjects. Hence, the descriptions found here represent social agreement on controversial terms and discussions to some degree. In this paper, German and French neologisms from the subject area of climate and environment will be examined primarily in Wikipedia, but also in the neighbouring resource Wiktionary,2 which is “a collaborative project to produce a free-content multilingual dictionary” (Wiktionary 2023). Since Wikipedia and Wiktionary are available in French and in German, 21010. International Contrastive Linguistics Conference (ICLC) both are equally suitable for the contrastive analysis. Thus, Wikipedia articles which are accessible in both languages (e.g. Klimanotstand and État d›urgence climatique) or Wikipedia articles about similar events and phenomena (e.g. Letzte Generation and Dernière Rénovation) will be compared. For example, we will have a closer look at other new terms specifying different thematic aspects of the discourse of climate and environment. We will mainly refer to those lexical items which can be found in the respective articles in both languages. Special emphasis will be on overlaps and differences, thematic foci, speaker’s positions and evaluative terms.
A central goal of linguistics is to understand the diverse ways in which human language can be organized (Gibson et al. 2019; Lupyan/Dale 2016). In our contribution, we present results of a large scale cross-linguistic analysis of the statistical structure of written language (Koplenig/Wolfer/Meyer 2023) we approach this question from an information-theoretic perspective. To this end, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections, so-called corpora, consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of un. To this end, we have trained a language model on more than 6,500 different documents as represented in 41 parallel/multilingual corpora consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Figure 1 shows that our database covers a large variety of different text types, e.g. religious texts, legalese texts, subtitles for various movies and talks, newspaper texts, web crawls, Wikipedia articles, or translated example sentences from a free collaborative online database. Furthermore, we use word frequency information from the Crúbadán project that aims at creating text corpora for a large number of (especially under-resourced) languages (Scannell 2007). We statistically infer the entropy rate of each language model as an information-theoretic index of (un)predictability/complexity (Schürmann/Grassberger 1996; Takahira/Tanaka-Ishii/Dębowski 2016). Equipped with this database and information-theoretic estimation framework, we first evaluate the so-called ‘equi-complexity hypothesis’, the idea that all languages are equally complex (Sampson 2009). We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. This constitutes evidence against the equi-complexity hypothesis from an information-theoretic perspective. We then present, discuss and evaluate evidence for a complexity-efficiency trade-off that unexpectedly emerged when we analysed our database: high-entropy languages tend to need fewer symbols to encode messages and vice versa. Given that, from an information theoretic point of view, the message length quantifies efficiency – the shorter the encoded message the higher the efficiency (Gibson et al. 2019) – this indicates that human languages trade off efficiency against complexity. More explicitly, a higher average amount of choice/uncertainty per produced/received symbol is compensated by a shorter average message length. Finally, we present results that could point toward the idea that the absolute amount of information in parallel texts is invariant across different languages.
In G, E, I, and H there are constructions with accusative NPs being the external argument of an infinitival, (1) to (4). In P these accusative NPs can only co-occur with an adjectival participle, (5), a construction also occurring in E, (6). The talk compares the syntactic and semantic structure of these constructions focussing on the syntactic category of the nonfinite clause, the status of the accusative NP, the status of the infinitive, restructuring effects, and embedding predicates (including aspect).
i. As to G, E, I, and H, the infinitival clause is regarded as a TP, i.e., a small clause. Its accusative NP and infinitival predicate form a unit – [4], [12], [8]. The AcI denotes, according to [4], an eventuality, which prevents it from being negated. Its subject is case marked by the matrix predicate, either by ECM or subject-to-object raising – [9] and [10]. AcI-constructions can show clause union effects, (7). H additionally allows Dative subjects in infinitive clauses, the latter only being licensed by impersonal predicates and co-occurring with an agreeing infinitive, (8a), – [3]. In case there is no agreeing infinitive, the Dative NP is the experiencer of the matrix clause, (8b). As for Italian, it allows Nominative subject NPs in the infinitive clause, (9a, b).
ii. As to P, small clause constructions differ structurally from E, G, I and H ones – [6], [7]. P small clauses are realizable by copula constructions with verbal być ‘be’ pronominal to ‘it’, (10), or “dual” copula elements, (cooccurrence of a pronominal and a verbal element, [1]), varying with respect to selectional restrictions (part of speech or case within complement phrases, extraction possibilities, [1]). The P counterpart to the AcI-constructions is the secondary predication over an accusative object via an adjectival present participle, (5), (11) and (12). The adjectival participle construction is systematically paraphrasable via clauses introduced by jak ‘how’ (11’) and (12’). In Polish, adjectival phrases like recytującego wiersz ‘reciting’, (11), and wracającego z podróży ‘returning’, (12), clearly function as adjuncts of the accusative object go ‘him’. In our talk, we will compare this P view to languages with typical AcI-constructions, where the AcI-clause is standardly analyzed as a complement of a matrix verb.
Interactants who encounter co-participant conduct which they find to be socio-normatively problematic or troublesome are faced with a range of choices. First and foremost, this includes the issue of whether to directly address it, or to simply ‘let it pass’ (at least for now) (Emerson/Messinger 1977). In the case of the former, the issue then becomes how to address it. Across the various ways in which participants can pragmatically engage with what they perceive to be transgressive or untoward behavior (e.g., Pomerantz 1978; Schegloff 1988b; Dersley/Wootton 2000; Günthner 2000; Bolden/Robinson 2011; Potter/Hepburn 2020; see also Rodriguez 2022), they sometimes meta-pragmatically formulate the co-participant’s doings in terms of specific actions. Such action descriptions are necessarily selective (Sacks 1963; Schegloff 1972, 1988a; Sidnell/Barnes 2013): They foreground certain aspects of the co-participant’s conduct, while backgrounding others, and thus contribute to publically construeing the formulated conduct in particular ways (Jayyusi 1993), viz. as socio-normatively problematic, transgressive or untoward, and interactionally accountable (Robinson 2016; Sidnell 2017).
This conversation analytic study compares the use of negation particles in spoken German and Persian, namely nein/nee and na. While these particles have a range of functions in both languages (Ghaderi 2022; Imo 2017), their use in response to news remains understudied. We focus on nein/nee and na in two sequential contexts: (i) after prior disconfirmations (Extract (a)) and (ii) in response to either solicited or unsolicited informings (see Extracts (b) and (c), respectively). In both contexts, nein/nee and na mark unexpectedness and open up an opportunity space for more, but they do so in different ways and with different outcomes. Nein/nee- and na-turns after disconfirming, often minimal responses to first-position confirmable turns mark the prior as unexpected (or even contrasting with the nein/nee/na-speaker’s expectations) and thus as expandable/accountable (cf. Ford 2001; Gubina/Betz 2021). Nein/nee/na-turns after informings (e.g., announcements that display a story teller’s negative emotional stance) differ not only in sequential position but also in prosodic realization. They can be either falling or rising, but all are characterized by marked prosody, i.e., lengthening, very low onset, smiling or breathy voice, or high overall pitch. Through position and turn design features, such nein/nee- and na-turns not only mark a prior turn as counter to (normative) expectations, but may also display the speaker’s affective stance and affiliate with the affective stance of the prior interactant. By comparing the use of nein/nee and na in German and Persian in the two functions illustrated in Extracts (a) and (b/c), we will show (i) how nein/nee- and na-turns shape interactional trajectories after responsive actions and (ii) what role the particles play in managing news and stance-taking as well as epistemic and affective positioning. Apart from revealing similarities in the use of German and Persian negation particles, the results of our crosslinguistic comparison will demonstrate that even if different languages have similar practices for specific actions, the use of these practices is language- and culture-specific. This means that even similar practices in different languages have their own “collateral effects” (Sidnell/Enfield 2012), linguistic and prosodic characteristic features, and, at least sometimes, consequences for social actions accomplished in the specific language (e.g., Dingemanse/Blythe/Dirksmeyer 2014; Evans/Levinson 2009; Floyd/Rossi/Enfield (eds.) 2020; Fox et al. 2009). Our study uses the method of Conversation Analysis (Sidnell/Stivers (eds.) 2013) and draws on more than 80 hours of audio and video recordings of spontaneous interactions (co-present, via video link, and on the telephone) in everyday and institutional contexts.
The issue: We discuss (declarative) prepositional object clauses (PO-clauses) in the West Germanic languages Dutch (NL), German (DE), and English (EN). In Dutch and German, PO-clauses occur with a prepositional proform (=PPF, Dutch: ervan, erover, etc.; German: drauf/darauf, drüber/darüber, etc.). This proform is optional with some verbs (1). In English, by contrast, P embeds a clausal complement in the case of gerunds or indirect questions (2), however, P is obligatorily absent when the embedded CP is a that-clause in its base positionv(3a). However, when the that-clause is passivized or topicalized, the stranded P is obligatory (3b). Given this scenario, we will address the following questions: i) Are there structural differences between PO-clauses with a P/PPF and those in which the P/PPF is optionally or obligatorily omitted? ii) In particular, do PO-clauses without P/PPF structurally coincide with direct object (=DO) clauses? iii) To what extent are case and nominal properties of clauses relevant? We use wh-extraction as a relevant test for such differences.
Previous research: Based on pronominalization and topicalization data in German and Dutch, PO-clauses are different from DO-clauses independent of the presence of the PPF (see, e.g., Breindl 1989; Zifonun/Hoffmann/Strecker 1997; Berman 2003; Broekhuis/Corver 2015 and references therein) (4,5). English pronominalization and topicalization data (3b) appear to point in the same direction (Fischer 1997; Berman 2003; Delicado Cantero 2013). However, the obligatory absence of P before that-clauses in base position indicates a convergence with DO-clauses.
Experimental evidence: To provide further evidence to these questions we tested PO-clauses in all three languages for long wh-extraction, which is usually possible for DO-clauses in English and Dutch, and in German for southern regional varieties. For German and Dutch we conducted rating studies using the thermometer method (Featherston 2008). Each study contained two sets of sentences: the first set tested long wh-extraction with regular DO-clauses (6). The second set tested wh-extraction from PO-clauses with and without PPFs (7), respectively. The results show no significant difference in extraction with PO-clauses whether or not the PPF was present even for those speakers who otherwise accept long-distance extraction in German. This supports a uniform analysis of PO-clauses with and without the PPF in contrast to DO-clauses. For English we tested extraction with verbs that select for PP-objects in two configurations: V+that-clause and V+P-gerund (8) in comparison to sentences without extraction. Participants rated sentences on a scale of 1 (unnatural) to 7 (natural). We included the gerund for English as this is a regular alternative for such objects. The results show that extraction is licit in both configurations. This suggests that English PO-clauses are different from German and Dutch PO-clauses: They rather behave as DO-clauses allowing for extraction. Note though, that the availability of extraction from P+gerund also shows that PPs are not islands for extraction in English. Overall, this shows that there is a split between English vs. German/Dutch PO-clauses when the P/PPF is absent. While these clauses behave like PO-clauses in the latter languages, extraction does not show a difference between DO- and PO-clauses in English. We will discuss the results in relation to the questions i)–iii) above.
Any bilingual dictionary is contrastive by nature, as it documents linguistic information between language pairs. However, the design and compilation of most bilingual dictionaries is often no more than mere lists of lexical or semantic equivalents. In internet forums, one can observe a huge interest in acquiring relevant knowledge about specific lexical items or pairs that are prone to comparison in a more comprehensive manner as they may pose lexical semantic challenges. In particular, these often concern easily confused pairs (e.g. false friends or paronyms) and new terms increasingly travelling between languages in news and social media (Šetka-Čilić/Ilić Plauc 2021). With regard to English and German, the fundamental comparative principles upon which contrastive guides should be build are either absent, or specialised contrastive dictionaries simply do not exist, e.g. comprehensive descriptive resources for false friends, paronyms, protologisms or neologisms (see Gouws/Prinsloo/de Schryver 2004). As a result, users turn to electronic resources such as Google translate, blogs and language forums for help. For example, it is English words such as muscular which have two German translations options.
These are two confusables muskulär and muskulös both of which exhibit a different semantic profile. German sensitiv/sensibel and their English formal counterparts sensitive/sensible are false friends. However, these terms are highly polysemous in both languages and have semantic features in common. Their full meaning spectrum is hardly captured in bilingual dictionaries to allow for a full comparison. Translating protologisms such as German Doppelwumms as well as more established new words is one of the most challenging problems. Currently, German neologisms such as Klimakleber are translated as climate glue (instead of climate activist glueing him-/herself onto objects) by online tools, simply causing mistakes and contextual distortion. Most challenges users face today are well-known (e.g. Rets 2016). New terms are often unregistered in dictionaries and it is often impossible to make appropriate choices between two or more (commonly misused) words between two languages (e.g. Benzehra 2007). These are all relevant problems to translators and language learners alike (e.g González Ribao 2019).
This paper calls for the implication of insights from contrastive lexicology into modern bilingual lexicography. To turn dictionaries into valuable resources and in order to create productive strategies in a learning environment, the practice of writing dictionaries requires a critical re-assessment. Furthermore, the full potential of electronic contrastive resources needs to be recognised and put into practice. After all, monolingual German lexicography has started to reflect on how users’ needs can be accounted for in specific comparative linguistic situations. Some of these ideas can be comfortably extended to bilingual reference guides. On the one hand, this paper will deliver a critical account of some English-German/German-English dictionaries and touch on the shortcomings of contemporary bilingual lexicography. On the other hand, with the help of fictitious resources I will demonstrate contrastive structures as focal points of consultations which answer some of the more frequent language questions more reliably. Among others, I will explain how we need to build user-friendly dictionaries to allow for translating false friends or easily confusable words from the source language into its target language efficiently. With regard to neologisms, I will show how discursive descriptions and definitions that are more elaborate can support language learners to learn about necessary extra-linguistic knowledge. Overall, this could improve the role of specialised dictionaries in the teaching or translating process (cf. Miliç/Sadri/Glušac 2019).
The International Comparable Corpus (ICC) (Kirk/Čermáková 2017; Čermáková et al. 2021) is an open initiative which aims to improve the empirical basis for contrastive linguistics by compiling comparable corpora for many languages and making them as freely available as possible as well as providing tools with which they can easily be queried and analysed. In this contribution we present the first release of written language parts of the ICC which includes corpora for Chinese, Czech, English, German, Irish (partly), and Norwegian. Each of the released corpora contains 400k words distributed over 14 different text categories according to the ICC specifications. Our poster covers the design basics of the ICC, its TEI encoding, a demonstration of using the ICC via different query tools, and an outlook on future plans.
Similar to the European Reference Corpus EuReCo (Kupietz et al. 2020), ICC follows the approach of reusing existing linguistic resources wherever possible in order to cover as many languages as possible with realistic effort in as short a time as possible. In contrast to EuReCo, however, comparable corpus pairs are not defined dynamically in the usage phase, but the compositions of the corpora are fixed in the ICC design. The approaches are thus complementary in this respect. The design principles and composition of the ICC are based on those of the International Corpus of English (ICE) (Greenbaum (ed.) 1996), with the deviation that the ICC includes the additional text category blog post and excludes spoken legal texts (see Čermáková et al. 2021 for details). ICC’s fixed-design approach has the advantage that all single-language corpora in the ICC have the same composition with respect to the selected text types and that this guarantees that the selected broad spectrum of potential influencing variables for linguistic variation is always represented. The disadvantage, however, is that this can only be achieved for quite small corpora and that the generalisability of comparative findings based on the ICC corpora will often need to be checked on larger monolingual corpora or translation corpora (Čermáková/Ebeling/Oksefjell Ebeling forthcoming). Arguing that such issues with comparability and representativeness are inevitable, in one way or the other, and need to be dealt with, our poster will discuss and exemplify the text selections in more detail.
In this presentation I show first results from an ongoing study about syntactic complexity of sanctioning turns in spoken language. This study is part of a larger project on sanctioning of misconduct in social interaction in different European languages (English, German, Italian and Polish). For the study I use video recordings of different everyday settings (family breakfasts, board game interactions and car rides) with three or four participants. These data come from the Parallel European Corpus of Informal Interaction (Kornfeld/Küttner/Zinken 2023; Küttner et al. submitted). I focus on sanctioning turns with more than one turn-constructional unit (see among others for TCUs: Sacks/Schegloff/Jefferson 1974; Clayman 2013). The study asks how often TCUs are linked to each other in the different languages, for what function, and how language diversity enters into this. Note that complex sanctioning turns do not always come as complex sentences.
Our paper discusses family language policies among multilingual families in Latvia with Russian as home language. The presentation is based on three case studies, i.e. interviews conducted with Russophones who have chosen to send their children to Latvian-medium pre-schools and schools. The main aim is to understand practices and regards among such families “from below,” i.e. which family-internal and family-external factors influenced the choice of Latvian-medium education and what impact this choice has on linguistic practices.
The paper shows that there have been critical events which both encouraged and discouraged the choice of Latvian-medium education. The wish to integrate into mainstream society has been met by obstacles both from ethnic Russians and Latvians. Yet, the three families consider their choices to be the right ones for the future development of their children in a multiethnic Latvia in which Latvian serves as the unifying language of society.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
Das nationalsozialistische Mobilisierungsregime war darauf angelegt, Zeitgenoss:innen zu Positionierungshandlungen zu bewegen: vom allfälligen ‚Hitlergruß‘ über die Mitwirkung bei Parteiorganisationen oder Spendensammlungen bis hin zur Anleitung zur Selbstreflexion in Tagebüchern nationalsozialistischer Schulungslager. Allerdings sollte eine solche Aufforderung zur affirmativen Positionierung nicht allein als Zwang verstanden werden, denn dies würde ausblenden, dass viele Zeitgenoss:innen tatsächlich Anhänger:innen des Nationalsozialismus waren oder dem nationalsozialistischen Gesellschaftsprojekt zumindest nicht grundsätzlich oder in allen Punkten ablehnend gegenüberstanden. Demzufolge scheint es treffend, eine je nach Kommunikationssituation und Akteursposition variierende Mischung aus Positionierungsdruck und -bedürfnis für den hier untersuchten historischen Kontext anzunehmen.
Sprachpolitik war in der Bundesrepublik Deutschland seit 1949 nie ein größeres Thema in Wahlkämpfen. Seit der Bundestagswahl 2017 hat sich dies jedoch geändert. Damals waren unter dem Eindruck des großen Migrationsandrangs im Jahr 2016 von einigen Parteien Positionen zu sprachlicher Integration in die Wahlprogramme aufgenommen worden. Unter Positionen sei hier der explizite sprachliche Ausdruck einer Haltung zu einem politischen Thema bzw. Themenbereich zu verstehen, der unter anderem im Rahmen von parteilichen Grundsatz- und Wahlprogrammen Orientierung hinsichtlich des (zukünftig zu erwartenden) politischen Handelns parteilicher Akteur/-innen bieten soll. Und auch die zunehmende Diversität der deutschen Gesellschaft führte schon bei der Wahl im Jahr 2017 zu einer Berücksichtigung von Themen der sprachlichen Bildung in der Programmatik der Parteien. Dieser Beitrag untersucht somit die Grundsatz- und Wahlprogramme der größten Parteien anhand der sprachpolitischen Ausdrucksweise.
Sich und andere politisch zu positionieren, ist eine elementare sprachliche und soziale Praxis. Dies zeigen etwa Diskussionen um europäische Identität in Zeiten des britischen EU-Austritts und einer umstrittenen EU-Grenzpolitik oder die Haltung zu Waffenlieferungen in Krisengebiete im Zuge des Kriegs in der Ukraine, der 2022 ausbrach, ebenso wie wiederkehrende Auseinandersetzungen um Themen wie Alltagsrassismus, Sexismus und Diskriminierung. Diese Beispiele, die aktuelle politische Ereignisse ebenso umfassen, wie fortlaufende, immer wieder neu aufflammende gesellschaftliche Debatten um grundlegende Fragen des Zusammenlebens, verdeutlichen: Wo und wie wir uns in der Gesellschaft verorten, ist eine alltägliche Frage. Politische Positionierungen werden nicht nur ständig vorgenommen, sie werden, wie auch Nicht-Positionierungen, ebenso kontinuierlich thematisiert und kontrovers diskutiert. Diese Einführung in das Band soll in die Thematik des politischen Positionierens durch Klärung des Termins und einem Beispiel aus der Praxis einführen.
“Die Sprach-Checker” (Eng. “Language Checkers”) are young citizen scientists from Mannheim’s highly diverse district Neckarstadt-West. Together with linguists, they investigate a tremendous treasure: their own multilingualism. They are exploring and (re)discovering their own languages and the other languages used in their environment while documenting and reflecting on their everyday experiences in and with different linguistic practices. Our aim is to raise awareness of their strengths and to promote appreciation for their language biographies, thus fostering a sense of identification with one’s own linguistic surroundings. Such a joint research endeavour offers empirical opportunities to address (linguistic) issues of societal relevance by collecting authentic data from the multicultural district and involving its residents and local stakeholders. In this paper, we will provide insights regarding the project’s background, conception, and outcomes. We address everyone who is planning or conducting a citizen science project with young people, especially children and adolescents, or who works at the interface between science and society.
Der Beitrag widmet sich dem Thema der kommunikativen Deviationen in Interviews im Ukrainischen und Deutschen. Dabei werden die Deviationen sowohl in den Presseinterviews als auch in den populärsten Videointerviews auf YouTube untersucht. Die Deviationen werden in die von der Position des Adressanten, des Adressaten sowie des Zuschauers aufgeteilt. Die Aufmerksamkeit wird der Sprach- und der kommunikativen Kompetenz der Kommunikanten als der Hauptursache der Deviationen in den Interviews gelenkt. Die Deviationen werden als eine der Voraussetzungen der erfolgreichen Kommunikation bestimmt.
Simultandolmetschen ist eine komplexe und kognitive Aktivität, bei der verschiedene Prozesse gleichzeitig ablaufen. Neben monolingualer Textverarbeitung braucht man auch dolmetschspezifische Strategien, die erworben werden müssen. Die Notstrategien werden erst dann angewendet, wenn die Kapazitätsgrenze des Dolmetschers erreicht ist.
For many reasons, Mennonite Low German is a language whose documentation and investigation is of great importance for linguistics. To date, most research projects that deal with this language and/ or its speakers have had a relatively narrow focus, with many of the data cited being of limited relevance beyond the projects for which they were collected. In order to create a resource for a broad range of researchers, especially those working on Mennonite Low German, the dataset presented here has been transformed into a structured and searchable corpus that is accessible online. The translations of 46 English, Spanish, or Portuguese stimulus sentences into Mennonite Low German by 321 consultants form the core of the MEND-corpus (Mennonite Low German in North and South America) in the Archive for Spoken German. In addition to describing the origin of this corpus and discussing possibilities and limitations for further research, we discuss the technical structure and search possibilities of the Database for Spoken German. Among other things, this database allows for a structured search of metadata, a context-sensitive token search, and the generation of virtual corpora that can be shared with others. Moreover, thanks to its text-sound alignment, one can easily switch from a particular text section of the corpus to the corresponding audio section. Aside from the desire to equip the reader with the technical knowledge necessary to use this corpus, a further goal of this paper is to demonstrate that the corpus still offers many possibilities for future research.
Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
This paper presents an extended annotation and analysis of interpretative reply relations focusing on a comparison of reply relation types and targets between conflictual pages and neutral pages of German Wikipedia (WP) talk pages. We briefly present the different categories identified for interpretative reply relations to analyze the relationship between WP postings as well as linguistic cues for each category. We investigate referencing strategies of WP authors in discussion page postings, illustrated by means of reply relation types and targets taking into account the degree of disagreement displayed on a WP talk page. We provide richly annotated data that can be used for further analyses such as the identification of interactional relations on higher levels, or for training tasks in machine learning algorithms.
Kontrastiv-multilingual angelegte empirische Studien erfordern eine vergleichbare Datengrundlage. Je nachdem, welche Forschungsfragen im Zentrum der sprachvergleichenden Untersuchungen stehen, bieten sich entweder Parallelkorpora oder vergleichbare einzelsprachliche Korpora als Datengrundlage an. Dieser Beitrag verfolgt hauptsächlich das Ziel, die Herausforderungen aufzuzeigen, die die Arbeit mit vergleichbaren Korpora im multilingualen Sprachvergleich aufwirft. Dabei soll u.a. das Prinzip der Vergleichbarkeit von Korpora thematisiert und methodologische Vorschläge für konkrete empirisch angelegte sprachvergleichende Analysen vorgelegt werden. Die Möglichkeiten und Grenzen der empirisch basierten quantitativen und qualitativen Analysearbeit werden durch die Präsentation einiger exemplarischer Forschungsfragen und -ergebnisse aufgezeigt. Einige Desiderata für zukünftige korpusbasierte Studien auf der Basis von vergleichbaren Korpora im multilingualen Raum schließen den Beitrag ab.
Picnick and Sauerkraut: German–English intra-writer variation in script and language (1867–1900)
(2023)
Intra-writer variation is a wide-spread phenomenon that nevertheless has received only limited research attention so far. Different addressees, bi- and multilingualism, or changing life phases are among the factors that contribute to such variation. In a study of diary entries by one writer covering three decades (1867–1900), this chapter investigates patterns of intra-writer variation between German and English (language and script) in nineteenth-century Canada, with a special focus on single word borrowings, person reference and place names. The long-term perspective provides a unique insight into the dynamics of a bilingual writer’s emerging sociolinguistic competence as reflected by the flexible yet structured use of his resources within the social space of a bilingual community.
In this article, we examine the current situation of data dissemination and provision for CMC corpora. By that we aim to give a guiding grid for future projects that will improve the transparency and replicability of research results as well as the reusability of the created resources. Based on the FAIR guiding principles for research data management, we evaluate the 20 European CMC corpora listed in the CLARIN CMC Resource family, individuate successful strategies among the existing corpora and establish best practices for future projects. We give an overview of existing approaches to data referencing, dissemination and provision in European CMC corpora, and discuss the methods, formats and strategies used. Furthermore, we discuss the need for community standards and offer recommendations for best practices when creating a new CMC corpus.
In this paper, the author studies the role of the dictionary in the first language acquisition, highlighting its didactic value. Based on two Romanian lexicographical works of the 19th century, Lexiconul de la Buda (Buda, 1825) [the Lexicon of Buda] et Vocabularu romano-francesu (Bucarest, 1870) [the Romanian-French Vocabulary], the author analyses the normative information recorded in the articles in order to observe which level of language (i. e. phonetical, morphological, syntactical and lexical) is concerned. Such an approach allows to distinguish between the possible changings both at the level of the perception or at the grammatical, lexical and semantical description, i. e. the settlement of the word in the first language, and at a technical level, i. e. the making of article and of dictionary.
This paper presents the decisions behind the design of a maths dictionary for primary school children. We are aware that there has been a considerable problem regarding Mexican children’s performance in maths dragging on for a long time, and far from getting better, it is getting worse. One of the probable causes seems to be the lack of coordination between maths textbooks and teaching methods. Most maths textbooks used in primary schools include lots of activities and problem-solving techniques, but hardly any conceptual information in the form of definitions or explanations. Consequently, many children learn to do things, but have difficulty understanding mathematical concepts and applying them in different contexts. To help solve this problem, at least partially, the project of the dictionary was launched aiming at helping children to grasp and understand maths concepts learned during those first six years of their formal education. The dictionary is a corpus-based terminographical product whose macrostructure, microstructure, typography, and additional information were specifically designed to help children understand mathematical concepts.
This paper deals with the lexicographic treatment of the evidently plenty and pervasive scatological vocabulary, that is vocabulary concerning the process and products of bodily excretion (especially feces), in the synchronic Early New High German Dictionary (FWB = Frühneuhochdeutsches Wörterbuch) from a dictionary user’s view. Initially, different cultural concepts of scatology by Norbert Elias, Michail Bachtin and Mary Douglas among others and the term taboo are reflected. Subsequently, selected lexical items such as words with a primary scatological meaning (e. g. drek, kot, scheisse), concealing expressions (euphemisms, periphrases, metaphors, e. g. sitzen, seine notdurft tun, bauernveiel), and certain aspects within the polysemy of the verb scheissen are discussed, the latter on the one hand referring to a physical process with uncontrollable aspects and on the other hand denoting a deliberate action and functionalized as a fighting word during the reformation. Focussing on different positions of lexicographical information within the microstructure of the FWB, the surveillance shows that in a synchronic perspective Early New High German scatological vocabulary is a heterogeneous and complex phenomenon due to speaker, context and respectively semantic and pragmatic purposes
To effectively design online tools and develop sophisticated programs, for the teaching of Ancient Greek language, there is a clear need for lexical resources that provide semantic links with Modern Greek. This paper proposes a microstructure for an online Ancient Greek to Modern Greek thesaurus (AMGthes) that serves educational purposes. The terms of this bilingual thesaurus have been selected from reference Ancient Greek texts, taught and studied during lower and upper secondary education in Greece. The main objective here is to build a semantic map that helps students find relevant and semanti- cally related terms (synonyms and antonyms) in Ancient Greek, and then provide a rich set of suitable translations and definitions in Modern Greek. Designed to be an online resource, the thesaurus is being developed using web technologies, and thus will be available to every school and university student that pursues a degree in digital humanities.
The paper presents the results of empirical research conducted with students from the Faculty of Translation studies of Ventspils University of Applied Sciences (VUAS) in Latvia. The study investigates the habits and practices concerning the use of dictionaries on the part of translation students, as well as types of dictionaries used, frequency of use, etc. The study also presents an insight into the evaluation of the usefulness of dictionaries by Latvian students. The research describes the advantages and disadvantages of dictionaries used by the respondents, the importance of the preface and the explanation of the terms and abbreviations used in dictionaries. The research conducted, as well as the insights, results and recommendations presented, will be relevant for the lexicographic community, as it reflects the experience of one Latvian University to improve the teaching of dictionary use and lexicographic culture in this country and to complement dictionary use research with the Latvian experience.
Learning from students. On the design and usability of an e-dictionary of mathematical graph theory
(2022)
We created a prototype of an electronic dictionary for the mathematical domain of graph theory. We evaluate our prototype and compare its effectiveness in task-based tests with that of Wikipedia. Our dictionary is based on a corpus; the terms and their definitions were automatically extracted and annotated by experts (cf. Kruse/Heid 2020). The dictionary is bilingual, covering German and English; it gives equivalents, definitions and semantically related terms. For the implementation of the dictionary, we used LexO (Bellandi et al. 2017). The target group of the dictionary are students of mathematics who attend lectures in German and work with English resources. We carried out tests to understand which items the students search for when they work on graph-theoretical tasks. We ran the same test twice, with comparable student groups, either allowing Wikipedia as an information source or our dictionary. The dictionary seems to be especially helpful for students who already have a vague idea of a term because they can use the resource to check if their idea is right.
This paper describes the results of an empirical investigation carried out within the project Lessico Multilingue dei Beni Culturali (LBC), whose aim is to create a multilingual online dictionary of the lexicon of the Italian artistic heritage. The dictionary, whose lexicographic process has already started, is intended for linguists and specialist translators as well as for professionals in the tourism sector and students of Foreign Languages and Literatures. The investigation conducted through a questionnaire submitted to undergraduate students at the University of Milan and at the University of Florence has a double aim: to research the habits in the use of lexicographic tools by possible users of the dictionary (Italian Learners of German Language), and to identify preferences regarding macro-, medio- and microstructural features of the future LBC-dictionary to realize a user-friendly tool. After a brief introduction on the state of the art of the survey in the field of Dictionary Users Studies, the article describes the questionnaire and the results obtained from the pilot study. A summary and a discussion on the future developments of the project conclude the work.
This paper gives an insight into a cross-media publishing process on different stages: from a printed bilingual syntagmatic dictionary for GFL to an online learner’s dictionary of German collocations to a German learner’s dictionary portal. On the basis of an sql database specially developed for a corpus-guided dictionary of German collocations, the bilingual syntagmatic learner’s dictionary KolleX was published in 2014. The first part of the article describes this lexicographic process, focusing the most relevant aspects of the dictionary concept, e. g. dictionary type, subject matter, corpus guided data selection and microstructure. The second part introduces the first online version of KolleX from 2016 and the profound changes in the editing system – from a desktop version (2005) to a web-based editing system (2016) –, which resulted successively in a prototype of a German learner’s dictionary portal, called E-KolleX DaF (2018–). Focusing on the aspects of dynamism and integration of different resources from a learner’s perspective the paper shows the innovative features of this new online reference work. The contribution presents the solutions for the integration of new datatypes in the database of KolleX and the linking to different data in German monolingual dictionary platforms. The paper outlines the web design, functioning and technical improvements of E-KolleX DaF. The conclusions provide an outlook to the forthcoming challenges.
Dieser Beitrag beschreibt die Motivation und Ziele hinter der Initiative Europäisches Referenzkorpus EuReCo. Ausgehend von den Desiderata, die sich aufgrund der Defizite verfügbarer Forschungsdaten wie monolinguale Korpora, Parallelkorpora und Vergleichskorpora für den Sprachvergleich ergeben, werden die bisherigen und die laufenden Arbeiten im Rahmen von EuReCo präsentiert und anhand vergleichender deutsch-rumänischer Kookkurrenzanalysen neue Perspektiven für kontrastive Korpuslinguistik, die die EuReCo-Initiative öffnet, skizziert.
Kontrastive Korpuslinguistik versteht sich als eine Bezeichnung für sprachvergleichende Studien, deren Ergebnisse mit Analysen sprachlicher Daten erreicht und empirisch fundiert sind. Die Bezeichnung contrastive corpus linguistics für eine neue, sich entwickelnde Disziplin wurde 1996 von Karin Aijmer und Bengt Altenberg (Schmied 2009: 1142) eingeführt. Der Einsatz der sprachlichen Korpora bei der Beschreibung kontrastiver Studien bedeutet in den 1990er-Jahren für die kontrastive Linguistik eine Wiederbelebung, nachdem die weit gesteckten Ziele und Hoffnungen in den 50er- und 60er-Jahren, die mit der Fremdsprachendidaktik zusammenhingen, vor etwa 50 Jahren aufgegeben wurden.
Kontrastive Korpuslinguistik
(2022)
This paper seeks to apply the principles of the famous 3-Circle-Model devised for the description of the ecolinguistic position of English world-wide to the position of German around the world.
On the one hand, the 3-Circle-Model for English with its "Inner", "Outer" and "Extended/Expanding" Circles was invented by Kachru in the 1980s and has since then been adopted, refined and criticised by numerous authors. The situation of German world-wide, on the other hand, has only been scarcely discussed in the past 20 years. While the global extension of German is obviously by far weaker than that of English, there are also a number of noteworthy similarities in terms of historical spread and the current position of these two languages.
This paper therefore discusses the analogies of global English and German by establishing three circles for German: the Inner Circle for the core German-speaking area, i.e. Germany, Austria and Switzerland; the Outer Circle including a number of German minority areas (mostly in Europe), and finally the Extended Circle which may be denoted as "Crumbling" rather than "Expanding". The latter comprises traditional German diaspora communities in different parts of the world which either result from migration, but also reflect the previous functions of German as a language of culture and as a lingua franca in regions like Eastern Europe. The paper argues that there are some striking structural similarities, but also shows the limits of this comparison.
There is a growing interest in pedagogical lexicography, and more specifically in the study of dictionary users’ abilities and strategies (Prichard 2008; Gavriilidou 2010, 2011; Gavriilidou/Mavrommatidou/Markos 2020; Gavriilidou/Konstantinidou 2021; Chatjipapa et al. 2020). Τhe purpose of this presentation is to investigate dictionary use strategy and the effect of an explicit and integrated dictionary awareness intervention program on upper elementary pupils’ dictionary use strategies according to gender and type of school. A total of 150 students from mainstream and intercultural schools, aged 10–12 years old, participated in the study. Data were collected before and after the intervention through the Strategy Inventory for Dictionary Use (SIDU) (Gavriilidou 2013). The results showed a significant effect of the intervention program on Dictionary Use Strategies employed by the experimental group and support the claim that increased dictionary use can be the outcome of explicit strategy instruction. In addition, the effective application of the program suggests that a direct and clear presentation of DUS is likely to be more successful than an implicit presentation. The present study contributes to the discussion concerning both the ‘teachability’ of dictionary use strategies and skills and the effective forms of intervention programs raising dictionary use awareness and culture.
In this paper, we propose a controlled language for authoring technical documents and report the status of its development, while maintaining a specific focus on the Japanese automotive domain. To reduce writing variations, our controlled language not only defines approved and unapproved lexical elements but also prescribes their preferred location in a sentence. It consists of components of a) case frames, b) case elements, c) adverbial modifiers, d) sentence-ending functions, and e) connectives, which have been developed based on the thorough analyses of a large-scale text corpus of automobile repair manuals. We also present our prototype of a writing assistant tool that implements word substitution and reordering functions, incorporating the constructed controlled language.
The focus of this paper will be on lexical information systems and the framework guidelines for the definition of the curricula within the educational system of the Autonomous Province of Bolzano/ Bozen (Italy). In Italy, the competences to be achieved at different school levels are published in the form of general guidelines. On this basis each school has to specify the general competency goals and to spell them out in a concrete curriculum. In this paper I will examine to what extent lexical information systems are represented in the framework guidelines within the German and the Italian educational system of the Autonomous Province, these being separate systems. In a second step, I will check the representations of the resources against the “Villa Vigoni Theses on Lexicography“. Finally, I will discuss the results and give an outlook for further research.
Thesauri have long been recognized as valuable structured resources aiding Information Retrieval systems. A thesaurus provides a precise and controlled vocabulary which serves to coordinate data indexing and retrieval. The paper presents a bilingual Greek and English specialized thesaurus that is being developed as the backbone of a platform aimed at enhancing and enriching the cultural experiences of visitors in Eastern Macedonia and Thrace, Greece. The cultural component of the intended platform comprises textual data, images of artifacts and living entities (animals and plants in the area), as well as audio and video. The thesaurus covers the domains of Archaeology, Literature, Mythology, and Travel; therefore, it can be viewed as a set of inter-linked thesauri. Where applicable, terms and names in the database are also geo-referenced.
This chapter starts out by giving a brief overview of the main priorities of international and German studies in the area of linguistic landscape research. The contributions to this volume are then embedded in current debates and developments in the field. Finally, we outline important desiderata of linguistic landscape research that focus on German and address challenges of knowledge transfer and application as well as possible contributions to international lines of research.
This paper discusses how the regional language of Latgalian in Latvia has benefitted from societal discourse on the antagonism between speakers of Latvian and Russian in Latvia. Triggered by the 2012 referendum on Russian as a possible second state language of Latvia, Latvian politics (exemplified by politicians' statements since 2012 as well as by 2014 election manifestoes) as well as society at large (displayed by e.g. increased attention in the educational sector and the media) have started to devote considerably more attention to the region of Latgale, including its cultural and linguistic heritage. The paper thereby argues that speakers of Latgalian have gained a noteworthy increase in voice, even though the future of the variety is still considered to be uncertain.
Sprachliche Zeichen im öffentlichen Raum (Linguistic Landscape - LL) tragen neben ihrer primären Bedeutung und Funktion wie Auskunft und Werbung auch sekundäre Informationen zur Sprachenhierarchie, zur Repräsentation von Minderheitensprachen, zur sprachlichen Toleranz gegenüber der Mehrsprachigkeit in diesem Raum, etc. Diese Vielschichtigkeit macht die sprachlichen Zeichen im öffentlichen Raum zu wertvollen Lernobjekten, an denen die im Berufsleben so bedeutende diskursive Lesefähigkeit der Studierenden trainiert werden kann. Der Beitrag öffnet Perspektiven auf die Möglichkeiten der Verknüpfung der LL-Analyse mit den Inhalten der traditionellen germanistischen Curricula wie auch benachbarter Fachbereiche und verweist auf bisherige Studien in diesem Bereich.
Vorwort
(2021)
Lexicographers working with minority languages face many challenges. When the language in question is also a sign language, circumstances specific to the visual-spatial modality have to be taken into consideration as well. In this paper, we aim to show and discuss which challenges we encounter while compiling the Digitales Wörterbuch der Deutschen Gebärdensprache (DW-DGS), the first corpus-based dictionary of German Sign Language (DGS). Some parallel the challenges minority language lexicographers of spoken languages encounter, e. g. few resources, no written tradition, and having to create one dictionary for all potential user groups, while others are specific to sign languages, e. g. representation of visual-spatial language and creating access structures for the dictionary.
This chapter discusses functions of the German language in the Linguistic Landscape (LL) of the Baltic states, with a focus on the Latvian capital Riga. For this end, it applies the "Spot German" approach (cf. Heimrath 2017) in the context of debates on the international role of German (cf. Ammon 2015). It argues that German is an "additional language of society" (cf. Marten 2017b), i.e. it is not a dominant language in the Baltics but can regularly be found in a variety of functions. These relate both to the historical role of German in the region (including its contemporary commodification) and to current relations between the Baltics and the German-speaking countries. These include tourism, business, or educational and political institutions, but also point to, e.g., discourses on the quality assigned to products from the German-speaking region. In this sense, the Baltic states are part of what may, in accordance with Kachru's (1985) 3-circle-model for English, be labelled as "extended circle" of German. At the same time, the chapter discusses how conclusions from Linguistic Landscape research can be used for understanding marketing both in and for the German language: On the one hand, German carries the potential of persuading customers to opt for a certain product. On the other hand, the abundance of situations where German can be "spotted" suggests that the LL may successfully be used for language-marketing purposes, as exemplified by a brochure and a poster created by the DAAD Information Centre for the Baltic states in Riga.
The EMLex Dictionary of Lexicography (= EMLexDictoL) is a plurilingual subject field dictionary (in German, English, Afrikaans, Galician, Italian, Polish and Spanish) that contains the basic subject field terminology of lexicography and dictionary research, in which the dictionary article texts are presented in a sophisticated but comprehensible form. The articles are supplemented by a complex crossreferencing system and the current subject field literature of the respective national languages. Following the lemma position, the dictionary articles contain items regarding morphology, synonymy, the position of the definiens, additional explanations, the cross-reference position, the position for literature, the equivalent terms in the other six languages of the dictionary as well as the names of the authors.
This paper focusss on the first Slavonic-Romanian lexicons, compiled in the second half of the 17th century and their use(rs), proposing a method of investigating the manner in which lexical information available in the above corpus relates, if at all, to the vocabulary of texts from the same period. We chose to investigate their relation to an anonymous Old Testament translation made from Church Slavonic, also from the second half of the 17th century, which was supposed to be produced in the same geographical area, in the same Church Slavonic school or even by the same author as the lexicons. After applying a lemmatizer on both the Biblical text (Books of Genesis and Daniel) and the Romanian material from the lexicons, we analyse the results and double the statistical analysis with a series of case studies, focusing on some common lexemes that might be an indicator of the relatedness of the texts. Even if the analysis points out that the lexicons might not have been compiled as a tool for the translation of religious texts, it proves to be a useful method that reveals interesting data and provides the basis for more extensive approaches.
Given the relevance of interoperability, born-digital lexicographic resources as well as legacy retro-digitised dictionaries have been using structured formats to encode their data, following guidelines such as the Text Encoding Initiative or the newest TEI Lex-0. While this new standard is being defined in a stricter approach than the original TEI dictionary schema, its reuse of element names for several types of annotation as well as the highly detailed structure makes it difficult for lexicographers to efficiently edit resources and focus on the real content. In this paper, we present the approach designed within LeXmart to facilitate the editing of TEI Lex-0 encoded resources, guaranteeing consistency through all editing processes.
An ongoing academic and research program, the “Vocabula Grammatica” lexicon, implemented by the Centre for the Greek Language (Thessaloniki, Greece), aims at lemmatizing all the philological, grammatical, rhetorical, and metrical terms in the written texts of scholars (philologists and scholiasts) who curated the ancient Greek literature from the beginning of the Hellenistic period (4th/3rd c. BC) until the end of the Byzantine era (15th c. AD). In particular, it aspires to fill serious gaps (a) in the study of ancient Greek scholarship and (b) in the lexicography of the ancient Greek language and literature. By providing specific examples, we will highlight the typical and methodological features of the forthcoming dictionary.
Basnage’s revision (1701) of Furetiere’s Dictionnaire universel is profoundly different from Furetiere’s work in several regards. One of the most noticeable features of the dictionary lies in his in- creased use of usage labels. Although Furetiere already made use of usage labels (see Rey 1990), Basnage gives them a prominent role. As he states in the preface to his edition, a dictionary that aspires to the title of “universal” should teach how to speak in a polite way (“poliment”), right (“juste”) and making use of specific terminology for each art. He specifies, lemma by lemma, the diaphasic dimension by indicating the word’s register and context of use, the diastratic one by noting the differences in the use of the language within the social strata, the diachronic evolution by indicating both archaisms and neologisms, the diame- sic aspect by highlighting the gaps between oral and written language, the diatopic one by specifying either foreign borrowings or regionalisms.
After extracting the entries containing formulas such as “ce mot est...”, “ce terme est...” and similar ones, we compare the number of entries and the type of information provided by the two lexicographers1. In this paper, we will focus on Basnage’s innovative contribution. Furthermore, we will try to identify the lexi- cographer’s sources, i. e. we will try to establish on which grammars, collections of linguistic remarks or contemporary dictionaries Basnage relies his judgements.
Wortgeschichte digital (‘digital word history’) is a new historical dictionary of New High German, the most recent period of German reaching from approximately 1600 AD up to the present. By contrast to many historical dictionaries, Wortgeschichte digital has a narrated text – a “word history” – at the core of its entries. The motivation for choosing this format rather than traditional microstructures is
briefly outlined. Special emphasis it put on the way these word histories interact with other components of the dictionary, notably with the quotation section. As Wortgeschichte digital is an online only project, visualizations play an important role for the design of the dictionary. Two examples are presented: first, the “quotation navigator” which is relevant for the microstructure of the entries, and, second, a timeline (“Zeitstrahl”) which is part of the macrostructure as it gives access to the lemma inventory from a diachronic point of view.
Within a rapidly digitalising society, it is important to understand how the learning and teaching of digital skills play out in situ, particularly amongst older adults who acquire these skills later in life. This paper focuses on participants engaged in the process of learning digital skills in adult education courses. Using video recordings from adult education centres in Finland and Germany, we explore how students mobilise their teachers’ assistance when encountering problems with their smartphones, laptops or tablets. Prior research on social interaction has shown that assistance can be recruited through a variety of verbal and embodied formats. In this specific educational setting, participants can use complaints about their digital skills or mobile devices to obtain assistance. Utilising multimodal conversation analysis, we describe two basic sequence types involving students’ complaints, discuss their cross-linguistic characteristics, and reflect on their connection to this educational setting and digital devices.
Since the beginning of the Covid-19 pandemic, about 2000 new lexical units have entered the German lexicon. These concern a multitude of coinings and word formations (Kuschelkontakt, rumaerosolen, pandemüde) as well as lexical borrowings mainly from English (Lockdown, Hotspot, Superspreader). In a special way, these neologisms function as keywords and lexical indicators sketching the development of the multifaceted corona discourse in Germany. They can be detected systematically by corpus-linguistic investigations of reports and debates in contemporary public communication. Keyword analyses not only exhibit new vocabulary, they also reveal discursive foci, patterns of argumentation and topicalisations within the diverse narratives of the discourse. With the help of quickly established and dominant neologisms, this paper will outline typical contexts and thematic references, but it will also identify speakers' attitudes and evaluations.
In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the biggest obstacle is the lack of machine readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment – OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now – OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many opensource and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are not efficient enough. This paper presents the OCR software called “SCyDia”, developed to overcome this issue. We demonstrate the organizational structure of the OCR software “SCyDia” and the first results. The “SCyDia” is a web-based software solution that relies on the open-source software “Tesseract” in the background. “SCyDia” also contains a module for semi-automatic text correction. We have already processed over 15,000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the “SCyDia” by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
Almanca tuhfe / Deutsches Geschenk (1916) oder: Wie schreibt man deutsch mit arabischen Buchstaben?
(2022)
Versified dictionaries are bilingual/multilingual glossaries written in verse form to teach essential words in any foreign language. In Islamic culture, versified dictionaries were produced to teach the Arabic language to the young generations of Muslim communities not native in Arabic. In the course of time, many bilingual/multilingual versified dictionaries were written in different languages throughout the Islamic world. The focus of this study is on the Turkish-German versified dictionary titled Almanca Tuhfe / Deutsches Geschenk [German Gift], published by Dr. Sherefeddin Pasha in Istanbul in 1916. This dictionary is the only dictionary in verse ever written combining these two languages. Moreover the dictionary is one of the few texts containing German words written in Arabic letters (applying Ottoman spelling conventions). The study concentrates on the way German words are spelled and tries to find out, whether Sherefeddin Pasha applied something like fixed rules to write the German lexemes.
This article aims to show the influence of doctrines in the medical lexicographers choices, with the Capuron-Nysten-Littré lineage as a case study. Indeed, the Dictionnaire de médecine has been crossed by several schools of thought such as spiritualism and positivism. While lexical continuity may seem self-evident due to the nature of the work, thus reducing the reprint to a simple lexical increase, this process introduces neologisms and deletions, all can be considered in their effects by using text statistics and factorial analysis.
This paper examines a certain subset of the vocabulary of Modern Icelandic, namely those words that are labelled as ‘ancient’ in the Dictionary of Contemporary Icelandic (DCI). The words were analysed and grouped into two main categories, 1) Words with only ‘ancient’ sense(s) and 2) words that have modern as well as an obsolete older sense. Several subgroups were identified as well as some lexical characteristics. The words in question were then analysed in two other sources, the Dictionary of Old Norse Prose (ONP) and the Icelandic Gigaword Corpus (IGC). The results show that the words belong to several semantic domains that reflect the types of texts that have survived until modern times. Most of the words are robustly attested in Old Norse sources, although there are a few exceptions. Large majority of the words can be found in Modern Icelandic texts, but to a varying degree. Limits of the corpus material makes it difficult to analyse some of the words. The result indicate that the words labelled ‘ancient’ can be divided into three main groups: a) words that are poorly attested and should perhaps not be included in the lexicographic description of Modern Icelandic; b) words that are likely to occur sometimes in Modern Icelandic; c) words that function as other inherited Old Norse words and perhaps do not require a special label or should have an additional sense in the DCI.
This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.
This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene.
This paper looks at whether, after two decades of corpus building for the Bantu languages, the time is ripe to begin using monitor corpora. As a proof-of-concept, the usefulness of a Lusoga monitor corpus for lexicographic purposes, in casu for the detection of neologisms, both in terms of new words and new meanings, is investigated and found useful.
Phonesthemes (Firth 1930) are sublexical constructions that have an effect on the lexico-grammatical continuum: they are recurring form-meaning associations that occur more often than by chance but not systematically (Abramova/Fernandez/Sangati 2013). Phonesthemes have been shown (Bergen 2004) to affect psycholinguistic language processing; they organise the mental lexicon. Phonesthemes appear over time to emerge as driven by language use as indexical rather than purely iconic constructions in the lexicon (Smith 2016; Bergen 2004; Flaksman 2020). Phonesthemes are acknowledged in construction morphology (Audring/Booij/Jackendoff 2017) as motivational schemas. Some phonesthemes also tend to have lexicographic acknowledgment, as shown by etymologist Liberman (2010), although this relevance and cohesion appears to be highly variable as we will show in this paper.
eThis paper first attempts a state-of-the art overview of what is known about women in the history of lexicography up to the early twentieth century. It then focusses more closely on the German and German-English lexicographical traditions to 1900, examining them from three different perspectives (following Russell’s 2018 study of women in English lexicography): women as users and dedicatees of dictionaries; women as contributors to and compilers of lexicographical works; and (in a very preliminary way) women and female sexuality as represented in German/English bilingual dictionaries of the eighteenth and early nineteenth centuries. Russell (2018) was able to identify some 24 dictionaries invoking women as patrons, dedicatees or potential users before 1700, and some 150 works in English lexicography by women between 1500 and 1900, besides the contribution of hundreds of women as supporters and helpers, not least as unpaid readers and sub-editors for the Oxford English Dictionary. Equivalent research in other languages is lacking, but this paper presents some of the known examples of women as lexicographers. The evidence tends to support Russell’s finding for English, that women were more likely to find a place in lexicography outside the mainstream: sometimes in a more private sphere (like Hester Piozzi); often in bilingual lexicography (such as Margrethe Thiele, working on a Danish-French dictionary), including missionary and or colonizing activity (such as Cinie Louw in Africa, Daisy Bates in Australia); and in dialect description (Coronedi Berti in Italy, Luisa Lacal and María Moliner in Spain). Within the German-speaking context, women who participated in lexicographical work themselves are hard to identify before the late nineteenth century, though those few women who did have access to education were often engaged in language learning, including translation activity, and they were likely users of bilingual and multilingual dictionaries. Christian Ludwig’s (1706) English-German dictionary – the first of its kind – was dedicated to the Electoral Princess Sophia of Hanover. Elizabeth Weir may have been the first named female compiler of a German dictionary, with her bilingual New German Dictionary (1888). Rather better known are the cases of Agathe Lasch and Luise Pusch, who, as pioneering women in the field of German linguistics, ultimately led major lexicographical projects documenting German regional varieties in the first half of the twentieth century (Middle Low German and Hamburgish in the case of Lasch; the Hessisch Nassau dialect dictionary in the case of Berthold). In the light of existing research on gender and sexuality in the history of English lexicography (e. g. Iamartino 2010; Turton 2019), I conclude with a preliminary exploration how woman and sexuality have been represented in dictionaries of German and English, taking the words Hure and woman in bilingual German-English dictionaries of the eighteenth and nineteenth centuries as my case studies.
In a multilingual and multicultural society, dictionaries play an important role to enhance interlingual communication. A diversity of languages and different levels of dictionary culture demand innovative lexicographic approaches to establish a dictionary landscape that responds to the needs of the various speech communities. Focusing on the South African situation this paper discusses some aspects of a few dictionaries that contributed to an improvement of the local dictionary landscape. Using the metaphors of bridges, dykes and sluice gates it is shown how lexicographers need a balanced approach in their lemma selection and treatment. Whilst a too strong prescriptive approach can be to the detriment of the macrostructural selection, a lack of regulatory criteria could easily lead to a data overload. The lexicographer should strive to give a reflection of the actual language use and enable the users to retrieve the information that can satisfy their specific communication and cognitive needs. Such lexicographic products will enrich and improve the dictionary landscape.
This paper presents the Lehnwortportal Deutsch, a new, freely accessible publication platform for resources on German lexical borrowings in other languages, to be launched in the second half of 2022. The system will host digital-native sources as well as existing, digitized paper dictionaries on loanwords, initially for some 15 recipient languages. All resources remain accessible as individual standalone dictionaries; in addition, data on words (etyma, loanwords etc.) together with their senses and relations to each other is represented as a cross-resource network in a graph database, with careful distinction between information present in the original sources and the curated portal network data resulting from matching and merging information on, e. g., lexical units appearing in multiple dictionaries. Special tooling is available for manually creating graphs from dictionary entries during digitization and for editing and augmenting the graph database. The user interface allows users to browse individual dictionaries, navigate through the underlying graph and ‘click together’ complex queries on borrowing constellations in the graph in an intuitive way. The web application will be available as open source.
The public as linguistic authority: Why users turn to internet forums to differentiate between words
(2022)
This paper addresses the question of why we face unsatisfactory German dictionary entries when looking up and comparing two similar lexical terms that are loan words, new words, (near) synonyms, or confusables. It explains how users are aware of existing reference works but still search or post on language forums, often after consulting a dictionary and experiencing a range of dictionary based problems. Firstly, these dictionary based difficulties will be scrutinised in more detail with respect to content, function, presentation, and the language of definitions. Entries documenting loan words and commonly confused pairs from different lexical reference resources serve as examples to show the short comings. Secondly, I will explain why learning about your target group involves studying discussion forums. Forums are a valuable source for detailed user studies, enabling the examination of different communicative needs, concrete linguistic questions, speakers’ intuitions, and people’s reactions to posts and comments. Thirdly, with the help of two examples I will describe how the study of chats and forums had a major impact on the development of a recently compiled German dictionary of confusables. Finally, that same problem solving approach is applied to the idea of a future dictionary of neologisms and their synonyms.
Dictionaries are often a reflection of their time; their respective (socio-)historical context influences how the meaning of certain lexical units is described. This also applies to descriptions of personal terms such as man or woman. Lexicographers have a special responsibility to comprehensively investigate current language use before describing it in the dictionary. Accordingly, contemporary academic dictionaries are usually corpus-based. However, it is important to acknowledge that language is always embedded in cultural contexts. Our case study investigates differences in the linguistic contexts of the use of man and woman, drawing from a range of language collections (in our case fiction books, popular magazines and newspapers). We explain how potential differences in corpus construction would therefore influence the “reality” depicted in the dictionary. In doing so, we address the far-reaching consequences that the choice of corpus-linguistic basis for an empirical dictionary has on semantic descriptions in dictionary entries.Furthermore, we situate the case study within the context of gender-linguistic issues and discuss how lexicographic teams can engage with how dictionaries might perpetuate traditional role concepts when describing language use.
Words and their usages are in many cases closely related to or embedded in social, cultural, technical and ideological contexts. This does not only apply to individual words and specific senses, but to many vocabulary zones as well. Moreover, the development of words is often related to aspects of socio-cultural evolution in a broad sense. In this paper I will have a look at traditional dictionaries and digital lexical systems focussing on the question how they deal with socio-cultural and discourse-related aspects of word usage. I will also propose a number of suggestions how future digital lexical systems might be enriched in this respect.
Tok Pisin is a pidgin/creole language spoken since the late 19th century in most of the area that nowadays constitutes Papua New Guinea where it emerged under German colonial rule. Unusual for a pidgin/creole, Tok Pisin is characterized by a extensive lexicographic history. The Tok Pisin Dictionary Collection at the Leibniz Institute for the German Language, described in this article, includes about fifty dictionaries. The collection forms the basis for the sketch of the history of Tok Pisin lexicography as part of colonial history presented here. The basic thesis is that in the history of Tok Pisin, lexicographic strat egies, dictionary structures, and publication patterns reflect the interest (and disinterest) of various groups of colonial actors. Among these colonial actors, European scientists, Catholic missionaries, and the Australian and US militaries played important roles.
The aim of this paper is to show how lexicographical choices reflect ideological thinking, singled out by Eagleton (2007) into the strategies of rationalizing, legitimating, action orienting, unifying, naturalizing and universalizing. It will be carried out by examining two twenty first century editions of each of the five English monolingual learner’s dictionaries published by Cambridge, Collins, Longman, Macmillan, and Oxford. The synchronic and diachronic analyses of the dictionaries and their different editions at the macro structural level (the wordlists) and at the micro structural level (the definitional styles) will show how the reduction and change of data, derived from heterogeneous social and cultural contexts of language use, to abstract essential forms, involves decisions about the central and peripheral aspects of the lexicon and the meaning of words.
Applying terminological methods to lexicography helps lexicographers deal with the terms occurring in general language dictionaries, especially when it comes to writing the definitions of concepts belonging to special fields. In the context of the lexicographic work of the Dicionário da Língua Portuguesa, an updated digital version of the last Academia das Ciências de Lisboa’ dictionary published in 2001, we have assumed that terminology – in its dual dimension, both linguistic and conceptual – and lexicography are complementary in their methodological approaches. Both disciplines deal with lexical items, which can be lexical units or terms. In this paper, we apply terminological methods to improve the treatment of terms in general language dictionaries and to write definitions as a form of achieving more precision and accuracy, and also to specify the domains to which they belong. Additionally, we highlight the consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are term identification markers instead of a flat list of domains. The need to create and make available structured, organised and interoperable lexicographic resources has led us to follow a path in which the application of standards and best practices of treating and representing specialised lexicographic content are fundamental requirements.
Mensch-Maschine-Interaktion im lexikographischen Prozess zu lexikalischen Informationssystemen
(2022)
Dictionaries of today and tomorrow are rather digital products than print dictionaries. From the user’s perspective, electronic dictionary applications and in particular „lexical information systems“, also referred to as „digital word information systems“ are coming to the fore alongside Google searches. Given the rapid developments in the area of the automated provision of lexicographic information, more precisely the automatic creation of online dictionaries, the new role of the lexicographer in the modern lexicographic process is questionable. This article addresses this issue.
While there was arguably a need for multi authored, multi volume, metalexicographic handbooks three decades ago – when the field of metalexicography was still ‘young’ – it is a bit puzzling to make sense of the current output flurry in this field. Is it simply a matter of ‘every publisher trying to fill its shelves’? or is there really a need in the scientific community for more and (continuously) updated reference works? And once available, are such works also consulted? Which parts? By whom? How often? For what purposes? In this paper we look at an ongoing, real world metalexicographic handbook project to answer these questions.
This paper focuses on the treatment of culture bound lexical items in a novel type of online learner’s dictionary model, the Phrase Based Active Dictionary (PAD). A PAD has a strong phraseological orientation: each meaning of a word is exclusively defined in a typical phraseological context. After introducing the relevant theory of realia in translation studies, we develop a broader notion of culture specific lexical items which is more apt to serve the purposes of learner’s lexicography and thus to satisfy the needs of a larger and often undefined target group. We discuss the treatment of such words and expressions in common English learner’s dictionaries and then present various excerpts from PAD entries in English, German, and Italian which display different strategies for coping with cultural contents in the lexicon. Our aim is to demonstrate that the phraseological approach at the core of the PAD model turns out to be extremely important to convey cultural knowledge in a suitable way for users to fully grasp cultural implications in language.
In foreign language teaching the use of dictionaries, especially bilingual, has always been related to the hypotheses concerning the relationship between the native language (L1) and second language acquisition method. If the bilingual dictionary was an obvious tool in the grammar-translation method, it was banned from the classroom in the direct, audiolingual and audiovisual methods. Also in the communicative method, foreign language learners are discouraged from using a dictionary. Its use should not obstruct the goals of communicatively oriented foreign language learning – a view still held by many foreign language teachers. Nevertheless, the reality has been different: Foreign language learners have always used dictionaries, even if they no longer possess a print dictionary and mainly use online resources and applications. Dictionaries and online resources will continue to play an important role in the future. In the Council of Europe’s language policy, with its emphasis on multilingualism and lifelong learning, the adequate use of reference tools as a strategic skill is highlighted. In several European countries, educational guidelines refer to the use of dictionaries in the context of media literacy, both in mother tongue and foreign language teaching. Not only is their adequate use important, but so too is the comparison, assessment and evaluation of the information presented, in order to develop Language Awareness and Language Learning Awareness. This is good news. However, does this mean that dictionaries are actually used in class? What role do dictionaries play in foreign language teaching in schools and universities? Are foreign language learners in the digital era really competent users? And how competent are their teachers? Are they familiar with the current (online) dictionary landscape? Can they support their students? After a more in-depth study of the status quo of dictionary use by foreign language learners and teachers and the gap between their needs and the reality, this contribution discusses the challenges facing lexicographers and meta-lexicographers and what educational policy measures are necessary to make their efforts worthwhile in turning foreign language learners – and their teachers – into competent users in a multilingual and digital world.
Wortgeschichte digital (Digital Word History) is an emerging historical dictionary of the German language that focuses on describing semantic shifts from about 1600 through today. This article provides deeper insight into the dictionary’s “cross-reference clusters,” one of its software tools that performs visualization of its reference network. Hence, the clusters are a part of the project’s macrostructure. They serve as both a means for users to find entries of interest and a tool to elucidate relations among dictionary entries. Rather than delve into technical aspects, this article focuses on the applied logics of the software and discusses the approach in light of the dictionary’s microstructure. The article concludes with some considerations about the clusters’ advantages and limitations.
Looking up for an unknown word is the most frequent use of a dictionary. For languages both agglutinative and inflectional, such as Georgian, this can be quite challenging because an inflected form can be very far from the lemmas used by the target dictionary. In addition, there is no consensus among Georgian lexicographers on which lemmas represent a verb in dictionaries. It further complicates dictionaries access. Kartu-Verbs is a base of inflected forms of Georgian verbs accessible by a logical information system. It currently contains more than 5 million inflected forms related to more than 16,000 verbs for 11 tenses; each form can have 11 properties; there are more than 80 million links in the base. This demonstration shows how, from any inflected form, we can find the relevant lemma to access any dictionary. Kartu-Verbs can thus be used as a front-end to any Georgian dictionary.
This paper reports on the restructuring of a bilingual (Greek Sign Language, GSL – Modern Greek) lexicographic database with the use of the WordNet semantic and lexical database. The relevant research was carried out by the Institute for Language and Speech Processing (ILSP) / Athena R.C. team within the framework of the European project Easier. The project will produce a framework for intelligent machine translation to bring down language barriers among several spoken/written and sign languages. This paper describes the experience of the ILSP team to contribute to a multilingual repository of signs and their corresponding translations and to organize and enhance a bilingual dictionary (GSL – Modern Greek) as a result of this mapping; this will be the main focus of this paper. The methodology followed relies on the use of WordNet and, more specifically, the Open Multilingual WordNet (OMW) tool to map content in GSL to WordNet synsets.
The paper presents the process of developing the AirFrame database, a specialized lexical resource in which aviation terminology is defined in the form of semantic frames, following the methodology of the Berkeley FrameNet (FN). First, the structure of the database is presented, and then the methodology applied in developing and populating the database is described. The link between specialized aviation frames and general language semantic frames, of which frames defining entities, processes, attributes and events are particularly relevant, is discussed on the example of the semantic frame of Flight and its related frames. The paper ends with discussing possibilities of using AirFrame as a model for further developing resources in which general and specialized knowledge are linked.
Many European languages have undergone considerable changes in orthography over the last 150 years. This hampers the application of modern computer-based analysers to older text, and hence computer-based annotation and studies of text collections spanning a long period. As a step towards a functional analyser for Norwegian texts (Nynorsk standard) from the 19th century, funding was granted in 2020 for creating a full form generator for all inflected forms of headwords found in Ivar Aasen’s dictionary published in 1873 (Aasen 1873) and his grammar from 1864 (Aasen 1864). Creating this word bank led to new insight in Aasen (1873), its structure, internal organisation, and ambition level as well as its link to Aasen (1864). As a test, the full form list generated from this new word bank was used to analyse the word inventory of texts by Aa. O. Vinje, written in the period 1850–1870. The Vinje texts were also analysed using a full form list of modern standard Norwegian, to study the differences in applicability and see how Vinje’s language relates to the written standard of modern Norwegian.
In this paper, we present LexMeta, a metadata model for the description of human-readable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexicographical works. The scope of the proposed model, however, is broader, aiming at the exchange of metadata with catalogues of Language Resources and Technologies and addressing a wider community of researchers besides lexicographers. For the definition of the LexMeta core classes and properties, we deploy widely used RDF vocabularies, mainly Meta-Share, a metadata model for Language Resources and Technologies, and FRBR, a model for bibliographic records.
In the course of the last years, digital lexicography has opened up a variety of avenues fostering the conceptualisation, application and use of constructicons, a type of lexicographical reference work which has revealed itself highly promising in terms of connectivity and flexibility, at the same time, however, also challenging as to its technical implementation. The present paper takes up the ambitious aim to propose some reflections as well as a first draft for a possible model of a multilingual ‘periphrasticon’ as a subtype of a bigger constructicon focusing on a specific typology-related structural feature, i. e. periphrasticity. Taking periphrastic verbal constructions in French, Italian and Spanish as a starting point, it tries to sketch out a unified constructional network including not only equivalent (or corresponding) constructions within Romance, but also establishing (formal and functional) cross-linguistic connections to German and English. Comprising the major languages available to most language learners in (at least) German-speaking environments, the model is also supposed to pave the way for multilingual constructicography which, on the one hand, is able to account for intra- and cross-linguistic relations and, on the other hand, can also prove a valuable tool for language learning and use.
We describe the status of work intending at including sign language lexical data within the OntoLex-Lemon framework. Our general goal is to provide for a multimodal extension to this framework, which was originally conceived for covering only the written and phonetic representation of lexical data. Our aim is to achieve in the longer term the same type of semantic interoperability between sign language lexical data as this is achieved for their spoken or written counterparts. We want also to achieve this goal across modalities: between sign language lexical data and spoken/written lexical data.
The long road to a historical dictionary of Lower Sorbian. Towards a lexical information system
(2022)
The Sorbian Institute has been taking preparatory steps for a historical-documentary vocabulary information system for Lower Sorbian for about 10 years. To this end, the entire extant written material (16th–21st centuries) of this strongly endangered European minority language is to be systematically evaluated. An attempt made a few years ago to organise and finance the project as a long-term scientific project was not successful in the end. Therefore, it can only be advanced step by step and via some detours. The article informs about the interim status of the project, especially with respect to the creation of a reliable database.
The paper presents the results of a survey on lexicographic practices and lexicographers’ needs across Europe that was conducted in the context of the Horizon 2020 project European Lexicographic Infrastructure (ELEXIS) among the observer institutions of the project. The survey is a revised and upgraded version of the survey which was originally conducted among ELEXIS lexicographic partner institutions in 2018 (Kallas et al. 2019a). The main goal of this new survey was to complement the data from the ELEXIS lexicographic partner institutions in order to get a more complete picture of lexicographic practices both for born-digital and retro-digitised resources in Europe. The results offer a detailed insight into many aspects of the lexicographic process at European institutions, such as funding, training, staff, lexicographic expertise, software and tools. In addition, the survey reflects on current trends in lexicography and reveals what institutions see as the most important emerging trends that will affect lexicography in the short-term and long-term future. Overall, the results provide valuable input informing the development of tools, resources, guidelines and training materials within ELEXIS.
This paper aims at verifying if the most important online Brazilian Portuguese dictionaries include some of the neologisms identified in texts published in the 1990s to 2000s, formed with the elements ciber-, e-, bio-, eco- and narco, which we refer to as fractomorphemes / fracto-morphèmes. Three online dictionaries were analyzed (Aulete, Houaiss and Michaelis), as well as Vocabulário Ortográfico da Língua Portuguesa (VOLP). We were able to conclude that all three dictionaries and VOLP include neologisms with these elements; Michaelis and VOLP do not include separate entries for bound morphemes, whereas Houaiss includes entries for all of them and Aulete includes entries for bio-, eco- and narco-. Aulete also describes the neological meaning of eco- and narco-, whereas Houaiss does not.
Word Families in Diachrony. An epoch-spanning structure for the word families of older German
(2022)
The ‘Word Families in Diachrony’ project (WoDia), for which a funding application to the DFG is in preparation, aims to provide a database driven online research environment that will enable processes of change in the entire historical vocabulary of German to be investigated by focusing on the changes in word families and the individual means of word formation. WoDia will embed the vocabularies of Old High German (OHG), Middle High German (MHG), Old Saxon (OS), and Middle Low German (MLG) in a database, resulting in a word-family structure for High and Low German from the beginnings up to the 15th century (for High German) and up to the 17th century (for Low German). The basis of the vocabulary is provided by reference dictionaries of the four historical varieties, whereas the word families’ historical structure is based on the word-family dictionary of OHG by Jochen Splett (1992). Each lemma in the database will be assigned, where appropriate, to a word family. The individual word-formation elements and the word-formation hierarchy will be mapped in a structural formula. The etymologically corresponding lemmas and word families of the different periods/varieties of older German will be linked so that an analysis across the varieties will also be possible. The annotations of word families in the database (e. g., relating to word structure) will be supplemented by linking their lemmas to the online dictionaries and to the reference corpora of Old German (OS and OHG), MHG, and MLG.
The digital environment represents a qualitatively new level of service for research work with linguistic information presented in dictionary form. And first of all, this applies to index systems. By dictionary indexing we mean a set of formalized rules and procedures, on the basis of which it is possible to obtain information about certain linguistic facts recorded in the dictionary. These rules are implemented in the form of user interfaces. However, one should take into account the fact that the effectiveness of automatic construction of index schemes for a digital dictionary is possible only in a sufficiently formalized environment. This article describes the method and technology of indexing the Etymological Dictionary of the Ukrainian Language (EDUL). For the language indexing of the dictionary, a special computer instrumental system (VLL – virtual lexicographic laboratory) was developed, and adapted to the structure of the EDUL and focused on the creation of indexes in automatic mode. The digital implementation of the EDUL made it possible to access the entire corpus of the dictionary text regardless of the time of publication of the corresponding volume and opened up opportunities for various digital interpretations of etymological information.
The paper describes an online German-Russian database for phraseological constructions (PhC), or syntactic idioms. It is a linguistic phenomenon representing a stable multi-word form that usually contains some auxiliary words (“anchors”) and partially opens up empty spaces (“slots”) which are filled directly in spoken language by various lexemes or combinations of lexemes (“fillers”, or “slot fillers”). Linguists from several German institutions are currently working on the database. The PhCs selected for the database have to meet special criteria. The database is a manual that combines scientific descriptions, a thesaurus and a bilingual dictionary. The database is designed as an active aid for text production in the respective foreign language; it is also a manual for language researchers and for translators. Apart from that, it can serve as a basis for extensions for other language pairs. The aim of the project is to record and to describe 300 PhC before the database is published. Our objective is to enable foreign language learners to use the syntactic idioms correctly in the texts they produce rather than create a big-sized database. The paper describes some issues related to the creation of the database, namely objectives and target groups, material and methods, microstructure of the database article and some others.
This paper presents the methodology of a research project on the use of specialised German dictionaries. A mixed-methods research approach will help to answer the following main questions, concerning the lexicographic presentation of the data on the one hand and the data collection on the other hand: How do different systems of data organization and presentation affect the likelihood that users will correctly find and select the data they look up? And does the probability of success increase if users are familiar with the system? Which advantages and disadvantages do lexicographers and specialised languages experts see in using quantitative methods to extract terms? And are these methods accepted and considered reliable by the user community?
The purpose of this paper is to present the lexicographic protocol and to report on the progress of compilation of Mikaela_Lex, which is a Greek, free online monolingual school dictionary for upper elementary students with visual impairments including 4,000 lemmata. The dictionary is equipped with new digital tools, such as the “Braille-system keyboard, a “speech-to-text” tool, a “text-to-speech” tool and also a qwerty accessibility for visually non-impaired students.
This paper describes a method for automatic identification of sentences in the Gigafida corpus containing multi-word expressions (MWEs) from the list of 5,242 phraseological units, which was developed on the basis of several existing open-access lexical resources for Slovene. The method is based on a definition of MWEs, which includes information on two levels of corpus annotation: syntax (dependency parsing) and morphology (POS tagging), together with some additional statistical parameters. The resulting lexicon contains 12,358 sentences containing MWEs extracted from the corpus. The extracted sentences were analysed from the lexicographic point of view with the aim of establishing canonical forms of MWEs and semantic relations between them in terms of variation, synonymy, and antonymy.
Dieser Beitrag skizziert einen paradoxen Wandelprozess, den wir „Denaturierung" nennen: Ursprünglich natürlichsprachige, orale, ersterworbene Varietäten werden durch sprachplanerische Maßnahmen zu literalen, nicht ersterworbenen Systemen. Wir diskutieren zunächst die Grundlagen dieses Prozesses: Die Literalisierung von Sprachsystemen und Gesellschaften bringt orale Non-Standard-Varietäten in funktionale Konkurrenzsituationen mit Standardvarietäten. Der Wunsch nach Bewahrung und (Re-)Vitalisierung dieser Varietäten erzwingt - um ihre funktionale Leistungsfähigkeit auszubauen - Standardisierungsprozesse der betroffenen Varietäten, wodurch in ihren Systemen Elemente auftreten, die nicht durch L1-Erwerb weitergegeben werden (können). Paradoxerweise soll also das Verschwinden natürlicher Sprachen (der muttersprachlich erworbenen Dialekte), die sich definitorisch gerade durch die funktionale Distanz zur Standardsprache auszeichnen, durch Eingriffe unterbunden werden, die ihrem Status als natürliche Sprachen entgegenwirken. Wir postulieren, dass diese Denaturierung eine Konsequenz der Faktoren Attrition und Standardisierung ist. Dazu illustrieren und kontrastieren wir den Verlauf dieses Prozesses anhand von drei germanischen Varietäten: Während das Bairische noch am Anfang einer möglichen Denaturierung steht, kann das sowohl von starker Attrition als auch gezielter Standardisierung betroffene Niederdeutsche in dieser Hinsicht bereits als fortgeschritten angesehen werden. Im modernen Färöischen, wo bei bewahrter hoher mündlicher Variation eine stark historisierende, unifizierende Schriftvarietät installiert wurde, fällt die Denaturierung mangels Attrition dagegen nur schwach aus.