Refine
Document Type
- Part of a Book (122)
- Book (16)
- Conference Proceeding (3)
- Periodical (1)
- Part of Periodical (1)
- Working Paper (1)
Keywords
- COVID-19 (39)
- Deutsch (33)
- Neologismus (28)
- Korpus <Linguistik> (25)
- Wörterbuch (24)
- Sprachgebrauch (23)
- Lexikographie (15)
- Wortschatz (11)
- Worthäufigkeit (10)
- Zweisprachiges Wörterbuch (9)
Publicationstate
- Veröffentlichungsversion (108)
- Zweitveröffentlichung (36)
Reviewstate
Publisher
- IDS-Verlag (144) (remove)
This paper consists of a short analysis of the sources and the treatment of the legal lexicon in the first dictionary published by the Spanish Royal Academy (1726–1739), followed by a longer commentary on the representation and the treatment of the concept of judge, in which the reflection of the extralinguistic factors in the definitions stands in focus. The results highlight the relevance of the legal context of that era for the treatment of the lexicon related to the legal domain, but they also demonstrate the pattern in which the lexicographic data displays peculiarities of legal matters.
‘Can’ and ‘must’-type modal verbs in the direct sanctioning of misconduct across European languages
(2023)
Deontic meanings of obligation and permissibility have mostly been studied in relation to modal verbs, even though researchers are aware that such meanings can be conveyed in other ways (consider, for example, the contributions to Nuyts/van der Auwera (eds.) 2016). This presentation reports on an ongoing project that examines deontic meaning but takes as its starting point not a type of linguistic structure but a particular kind of social moment that presumably attracts deontic talk: The management of potentially ‚unacceptable‘ or untoward actions (taking the last bread roll at breakfast, making a disallowed move during a board game, etc.). Data come from a multi-language parallel video corpus of everyday social interaction in English, German, Italian, and Polish. Here, we focus on moments in which one person sanctions another’s behavior as unacceptable. Using interactional-linguistic methods (Couper-Kuhlen/Selting 2018), we examine similarities and differences across these four languages in the use of modal verbs as part of such sanctioning attempts. First results suggest that modal verbs are not as common in the sanctioning of misconduct as one might expect. Across the four languages, only between 10%–20% of relevant sequences involve a modal verb. Most of the time, in this context, speakers achieve deontic meaning in other ways (e.g., infinitives such as German nicht so schmatzen, ‚no smacking‘). This raises the question what exactly modal verbs, on those relatively rare occasions when they are used, contribute to the accomplishment of deontic meaning. The reported study pursues this question in two ways: 1) By considering similarities across languages in the ways that modal verbs interact with other (verbal) means in the sanctioning of misconduct.; 2) By considering differences across languages in the use of modal verbs. Here, we find that the relevant modal verbs are used similarly in some activity contexts (enforcing rules during board games), but less so in other activity contexts (mundane situations with no codified rules). In sum, the presented study adds to cross-linguistically grounded knowledge about deontic meaning and its relationships to linguistics structures.
Dictionaries have been part and parcel of literate societies for many centuries. They assist in communication, particularly across different languages, to aid in understanding, creating, and translating texts. Communication problems arise whenever a native speaker of one language comes into contact with a speaker of another language. At the same time, English has established itself as a lingua franca of international communication. This marked tendency gives lexicography of English a particular significance, as English dictionaries are used intensively and extensively by huge numbers of people worldwide.
This paper reports on the restructuring of a bilingual (Greek Sign Language, GSL – Modern Greek) lexicographic database with the use of the WordNet semantic and lexical database. The relevant research was carried out by the Institute for Language and Speech Processing (ILSP) / Athena R.C. team within the framework of the European project Easier. The project will produce a framework for intelligent machine translation to bring down language barriers among several spoken/written and sign languages. This paper describes the experience of the ILSP team to contribute to a multilingual repository of signs and their corresponding translations and to organize and enhance a bilingual dictionary (GSL – Modern Greek) as a result of this mapping; this will be the main focus of this paper. The methodology followed relies on the use of WordNet and, more specifically, the Open Multilingual WordNet (OMW) tool to map content in GSL to WordNet synsets.
Redeeinleiter sind sprachliche Ausdrücke unterschiedlicher Wortarten, die relativ zur Redewiedergabe in Voran-, Mittel- oder Nachstellung stehen und eine direkte oder indirekte Redewiedergabe einleiten. Dadurch sind Redeeinleiter sehr vielfältig, womit sie sich als Untersuchungsgegenstand einer Analyse zur lexikalischen Vielfalt von Teilwortschätzen eignen.
Als Datengrundlage der vorliegenden Untersuchung dienen die manuell annotierten direkten und indirekten Redeeinleiter des Redewiedergabe-Korpus. Dieses setzt sich aus fiktionalen und nicht-fiktionalen Textausschnitten, die zwischen 1840–1920 veröffentlicht wurden, zusammen. Ziel der Analyse ist es, zu ermitteln, wie sich der Teilwortschatz der direkten und der der indirekten Redeeinleiter in ihrer lexikalischen Vielfalt voneinander unterscheiden und wie diese Unterschiede zu begründen sind. Dafür wird ein Set an quantitativen Methoden erarbeitet mit dem die lexikalische Vielfalt von Teilwortschätzen bestimmt werden kann und das in zukünftigen Untersuchungen zur lexikalischen Vielfalt als Standardrepertoire herangezogen werden kann.
Inspired by GWLN 3, we take a look at the new words, meanings, and expressions that have been created during or promoted by the COVID-19 pandemic. The pandemic provides a rare opportunity to follow the rise, spread, and integration of words and expressions in a language that may serve as an illustration of how linguistic innovation in general works. Relevant words were selected from various lists, notably monthly and annual lists of prominent words attested in the corpus of The Danish Dictionary. Analysis of these lists gives an insight into the number of words that stand out month by month and what kinds of words are involved, both in terms of morphological type and of semantic category, with special attention given to neologisms. Finally, we discuss the criteria for selecting which words to include in the dictionary. With this study, Danish is added to the list of languages covered in the GWLN series on
COVID-19 neologisms.
The paper presents the results of a survey on lexicographic practices and lexicographers’ needs across Europe that was conducted in the context of the Horizon 2020 project European Lexicographic Infrastructure (ELEXIS) among the observer institutions of the project. The survey is a revised and upgraded version of the survey which was originally conducted among ELEXIS lexicographic partner institutions in 2018 (Kallas et al. 2019a). The main goal of this new survey was to complement the data from the ELEXIS lexicographic partner institutions in order to get a more complete picture of lexicographic practices both for born-digital and retro-digitised resources in Europe. The results offer a detailed insight into many aspects of the lexicographic process at European institutions, such as funding, training, staff, lexicographic expertise, software and tools. In addition, the survey reflects on current trends in lexicography and reveals what institutions see as the most important emerging trends that will affect lexicography in the short-term and long-term future. Overall, the results provide valuable input informing the development of tools, resources, guidelines and training materials within ELEXIS.
In the etymological information for a word in a dictionary, the first question to be answered is whether the word is a borrowing or the result of word formation. Here, we consider this question for internationalisms ending in -ation in German and in -ácia in Slovak. In German, -ation is a suffix that attaches to verbs in -ieren. For these verbs, it is in competition with -ung. In Slovak, -ácia is a suffix that attaches to bases of Latin or Greek origin. The corresponding verbs are often backformations. Most Slovak verbs also have a nominalization in -nie. In order to investigate to what extent the nouns in -ation or -ácia are borrowings or derived from the corresponding verbs in German and Slovak, we took a random sample of English nouns in -ation for which OED gives a corresponding verb. For this sample, we checked whether the cognate noun in -ation or -ácia is attested in standard dictionaries and in corpora. Then we did the same for the corresponding verbs and the nouns in -ung or -nie. Finally, we checked the frequency of these words in DeReKo for German and SNK for Slovak. On this basis, we found evidence that -ation in German has a slightly different status to -ácia in Slovak. This status affects the relationship to the corresponding verbs and to the nouns in -ung or -nie. Such generalizations are important as background information for specifying etymological information in dictionaries, especially for languages where first attestations dates are not readily available.
The paper presents the results of empirical research conducted with students from the Faculty of Translation studies of Ventspils University of Applied Sciences (VUAS) in Latvia. The study investigates the habits and practices concerning the use of dictionaries on the part of translation students, as well as types of dictionaries used, frequency of use, etc. The study also presents an insight into the evaluation of the usefulness of dictionaries by Latvian students. The research describes the advantages and disadvantages of dictionaries used by the respondents, the importance of the preface and the explanation of the terms and abbreviations used in dictionaries. The research conducted, as well as the insights, results and recommendations presented, will be relevant for the lexicographic community, as it reflects the experience of one Latvian University to improve the teaching of dictionary use and lexicographic culture in this country and to complement dictionary use research with the Latvian experience.
The public as linguistic authority: Why users turn to internet forums to differentiate between words
(2022)
This paper addresses the question of why we face unsatisfactory German dictionary entries when looking up and comparing two similar lexical terms that are loan words, new words, (near) synonyms, or confusables. It explains how users are aware of existing reference works but still search or post on language forums, often after consulting a dictionary and experiencing a range of dictionary based problems. Firstly, these dictionary based difficulties will be scrutinised in more detail with respect to content, function, presentation, and the language of definitions. Entries documenting loan words and commonly confused pairs from different lexical reference resources serve as examples to show the short comings. Secondly, I will explain why learning about your target group involves studying discussion forums. Forums are a valuable source for detailed user studies, enabling the examination of different communicative needs, concrete linguistic questions, speakers’ intuitions, and people’s reactions to posts and comments. Thirdly, with the help of two examples I will describe how the study of chats and forums had a major impact on the development of a recently compiled German dictionary of confusables. Finally, that same problem solving approach is applied to the idea of a future dictionary of neologisms and their synonyms.
The public as linguistic authority: Why users turn to internet forums to differentiate between words
(2022)
This paper addresses the question of why we face unsatisfactory German dictionary entries when looking up and comparing two similar lexical terms that are loan words, new words, (near)-synonyms, or confusables. It explains how users are aware of existing reference works but still search or post on language forums, often after consulting a dictionary and experiencing a range of dictionary-based problems. Firstly, these dictionary-based difficulties will be scrutinised in more detail with respect to content, function, presentation, and the language of definitions. Entries documenting loan words and commonly confused pairs from different lexical reference resources serve as examples to show the shortcomings. Secondly, I will explain why learning about your target group involves studying discussion forums. Forums are a valuable source for detailed user studies, enabling the examination of different communicative needs, concrete linguistic questions, speakers’ intuitions, and people’s reactions to posts and comments. Thirdly, with the help of two examples I will describe how the study of chats and forums had a major impact on the development of a recently compiled German dictionary of confusables. Finally, that same problem-solving approach is applied to the idea of a future dictionary of neologisms and their synonyms.
Any bilingual dictionary is contrastive by nature, as it documents linguistic information between language pairs. However, the design and compilation of most bilingual dictionaries is often no more than mere lists of lexical or semantic equivalents. In internet forums, one can observe a huge interest in acquiring relevant knowledge about specific lexical items or pairs that are prone to comparison in a more comprehensive manner as they may pose lexical semantic challenges. In particular, these often concern easily confused pairs (e.g. false friends or paronyms) and new terms increasingly travelling between languages in news and social media (Šetka-Čilić/Ilić Plauc 2021). With regard to English and German, the fundamental comparative principles upon which contrastive guides should be build are either absent, or specialised contrastive dictionaries simply do not exist, e.g. comprehensive descriptive resources for false friends, paronyms, protologisms or neologisms (see Gouws/Prinsloo/de Schryver 2004). As a result, users turn to electronic resources such as Google translate, blogs and language forums for help. For example, it is English words such as muscular which have two German translations options.
These are two confusables muskulär and muskulös both of which exhibit a different semantic profile. German sensitiv/sensibel and their English formal counterparts sensitive/sensible are false friends. However, these terms are highly polysemous in both languages and have semantic features in common. Their full meaning spectrum is hardly captured in bilingual dictionaries to allow for a full comparison. Translating protologisms such as German Doppelwumms as well as more established new words is one of the most challenging problems. Currently, German neologisms such as Klimakleber are translated as climate glue (instead of climate activist glueing him-/herself onto objects) by online tools, simply causing mistakes and contextual distortion. Most challenges users face today are well-known (e.g. Rets 2016). New terms are often unregistered in dictionaries and it is often impossible to make appropriate choices between two or more (commonly misused) words between two languages (e.g. Benzehra 2007). These are all relevant problems to translators and language learners alike (e.g González Ribao 2019).
This paper calls for the implication of insights from contrastive lexicology into modern bilingual lexicography. To turn dictionaries into valuable resources and in order to create productive strategies in a learning environment, the practice of writing dictionaries requires a critical re-assessment. Furthermore, the full potential of electronic contrastive resources needs to be recognised and put into practice. After all, monolingual German lexicography has started to reflect on how users’ needs can be accounted for in specific comparative linguistic situations. Some of these ideas can be comfortably extended to bilingual reference guides. On the one hand, this paper will deliver a critical account of some English-German/German-English dictionaries and touch on the shortcomings of contemporary bilingual lexicography. On the other hand, with the help of fictitious resources I will demonstrate contrastive structures as focal points of consultations which answer some of the more frequent language questions more reliably. Among others, I will explain how we need to build user-friendly dictionaries to allow for translating false friends or easily confusable words from the source language into its target language efficiently. With regard to neologisms, I will show how discursive descriptions and definitions that are more elaborate can support language learners to learn about necessary extra-linguistic knowledge. Overall, this could improve the role of specialised dictionaries in the teaching or translating process (cf. Miliç/Sadri/Glušac 2019).
Basnage’s revision (1701) of Furetiere’s Dictionnaire universel is profoundly different from Furetiere’s work in several regards. One of the most noticeable features of the dictionary lies in his in- creased use of usage labels. Although Furetiere already made use of usage labels (see Rey 1990), Basnage gives them a prominent role. As he states in the preface to his edition, a dictionary that aspires to the title of “universal” should teach how to speak in a polite way (“poliment”), right (“juste”) and making use of specific terminology for each art. He specifies, lemma by lemma, the diaphasic dimension by indicating the word’s register and context of use, the diastratic one by noting the differences in the use of the language within the social strata, the diachronic evolution by indicating both archaisms and neologisms, the diame- sic aspect by highlighting the gaps between oral and written language, the diatopic one by specifying either foreign borrowings or regionalisms.
After extracting the entries containing formulas such as “ce mot est...”, “ce terme est...” and similar ones, we compare the number of entries and the type of information provided by the two lexicographers1. In this paper, we will focus on Basnage’s innovative contribution. Furthermore, we will try to identify the lexi- cographer’s sources, i. e. we will try to establish on which grammars, collections of linguistic remarks or contemporary dictionaries Basnage relies his judgements.
Phonesthemes (Firth 1930) are sublexical constructions that have an effect on the lexico-grammatical continuum: they are recurring form-meaning associations that occur more often than by chance but not systematically (Abramova/Fernandez/Sangati 2013). Phonesthemes have been shown (Bergen 2004) to affect psycholinguistic language processing; they organise the mental lexicon. Phonesthemes appear over time to emerge as driven by language use as indexical rather than purely iconic constructions in the lexicon (Smith 2016; Bergen 2004; Flaksman 2020). Phonesthemes are acknowledged in construction morphology (Audring/Booij/Jackendoff 2017) as motivational schemas. Some phonesthemes also tend to have lexicographic acknowledgment, as shown by etymologist Liberman (2010), although this relevance and cohesion appears to be highly variable as we will show in this paper.
Given the relevance of interoperability, born-digital lexicographic resources as well as legacy retro-digitised dictionaries have been using structured formats to encode their data, following guidelines such as the Text Encoding Initiative or the newest TEI Lex-0. While this new standard is being defined in a stricter approach than the original TEI dictionary schema, its reuse of element names for several types of annotation as well as the highly detailed structure makes it difficult for lexicographers to efficiently edit resources and focus on the real content. In this paper, we present the approach designed within LeXmart to facilitate the editing of TEI Lex-0 encoded resources, guaranteeing consistency through all editing processes.
In G, E, I, and H there are constructions with accusative NPs being the external argument of an infinitival, (1) to (4). In P these accusative NPs can only co-occur with an adjectival participle, (5), a construction also occurring in E, (6). The talk compares the syntactic and semantic structure of these constructions focussing on the syntactic category of the nonfinite clause, the status of the accusative NP, the status of the infinitive, restructuring effects, and embedding predicates (including aspect).
i. As to G, E, I, and H, the infinitival clause is regarded as a TP, i.e., a small clause. Its accusative NP and infinitival predicate form a unit – [4], [12], [8]. The AcI denotes, according to [4], an eventuality, which prevents it from being negated. Its subject is case marked by the matrix predicate, either by ECM or subject-to-object raising – [9] and [10]. AcI-constructions can show clause union effects, (7). H additionally allows Dative subjects in infinitive clauses, the latter only being licensed by impersonal predicates and co-occurring with an agreeing infinitive, (8a), – [3]. In case there is no agreeing infinitive, the Dative NP is the experiencer of the matrix clause, (8b). As for Italian, it allows Nominative subject NPs in the infinitive clause, (9a, b).
ii. As to P, small clause constructions differ structurally from E, G, I and H ones – [6], [7]. P small clauses are realizable by copula constructions with verbal być ‘be’ pronominal to ‘it’, (10), or “dual” copula elements, (cooccurrence of a pronominal and a verbal element, [1]), varying with respect to selectional restrictions (part of speech or case within complement phrases, extraction possibilities, [1]). The P counterpart to the AcI-constructions is the secondary predication over an accusative object via an adjectival present participle, (5), (11) and (12). The adjectival participle construction is systematically paraphrasable via clauses introduced by jak ‘how’ (11’) and (12’). In Polish, adjectival phrases like recytującego wiersz ‘reciting’, (11), and wracającego z podróży ‘returning’, (12), clearly function as adjuncts of the accusative object go ‘him’. In our talk, we will compare this P view to languages with typical AcI-constructions, where the AcI-clause is standardly analyzed as a complement of a matrix verb.
The EMLex Dictionary of Lexicography (= EMLexDictoL) is a plurilingual subject field dictionary (in German, English, Afrikaans, Galician, Italian, Polish and Spanish) that contains the basic subject field terminology of lexicography and dictionary research, in which the dictionary article texts are presented in a sophisticated but comprehensible form. The articles are supplemented by a complex crossreferencing system and the current subject field literature of the respective national languages. Following the lemma position, the dictionary articles contain items regarding morphology, synonymy, the position of the definiens, additional explanations, the cross-reference position, the position for literature, the equivalent terms in the other six languages of the dictionary as well as the names of the authors.
To effectively design online tools and develop sophisticated programs, for the teaching of Ancient Greek language, there is a clear need for lexical resources that provide semantic links with Modern Greek. This paper proposes a microstructure for an online Ancient Greek to Modern Greek thesaurus (AMGthes) that serves educational purposes. The terms of this bilingual thesaurus have been selected from reference Ancient Greek texts, taught and studied during lower and upper secondary education in Greece. The main objective here is to build a semantic map that helps students find relevant and semanti- cally related terms (synonyms and antonyms) in Ancient Greek, and then provide a rich set of suitable translations and definitions in Modern Greek. Designed to be an online resource, the thesaurus is being developed using web technologies, and thus will be available to every school and university student that pursues a degree in digital humanities.
Applying terminological methods to lexicography helps lexicographers deal with the terms occurring in general language dictionaries, especially when it comes to writing the definitions of concepts belonging to special fields. In the context of the lexicographic work of the Dicionário da Língua Portuguesa, an updated digital version of the last Academia das Ciências de Lisboa’ dictionary published in 2001, we have assumed that terminology – in its dual dimension, both linguistic and conceptual – and lexicography are complementary in their methodological approaches. Both disciplines deal with lexical items, which can be lexical units or terms. In this paper, we apply terminological methods to improve the treatment of terms in general language dictionaries and to write definitions as a form of achieving more precision and accuracy, and also to specify the domains to which they belong. Additionally, we highlight the consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are term identification markers instead of a flat list of domains. The need to create and make available structured, organised and interoperable lexicographic resources has led us to follow a path in which the application of standards and best practices of treating and representing specialised lexicographic content are fundamental requirements.
Our everyday lives in any social community are shaped by rules (e.g., Roughley 2019; Schmidt/Rakoczy 2019). Rules (in a broad sense) are interactionally negotiated, monitored, enforced, and serve as an ‘orientation value‘ in social life. If someone‘s behavior is treated as norm-violating or problematic in certain way, it may be therefore confronted. Confronting interlocutors can immediately stop, modify, or retrospectively reprimand the misconduct of others in a moralizing manner. Such confrontations of a problem behavior occur commonly in informal interactions. On the basis of our corpus, specifically in informal interactions at the table, I observed that, for example, in Polish, German and British English, direct confrontations occur on average at least once every three minutes. Participants design these actions in a variety of ways, but like everything in interaction, the design is not arbitrary (Sacks 1984; Enfield/Sidnell 2019). A recurrent feature of such turns is connecting misconduct to some more general concepts. It is evident from the data that e.g. speakers of German and Polish use ‘generally valid statements’ in problematic moments (cf. Küttner/Vatanen/Zinken 2022) to reach the closure of the problem sequence, also specifically dealing there with distribution of deontic and epistemic rights (Rogowska in prep.). I ask, when and for what purpose generality, that is, abstracting from a concrete behaviour, is used as a tool while confronting others. The focus is on sequential and linguistic features of abstracting in confronting moments in language comparison. What are the methods to achieve abstraction: i) defocusing the confronted, specific agent (cf. Zinken et al. 2021; Siewierska 2008), e.g. nur derjenige der dran ist der darf die bedingungen für den handel stellen (only the one whose turn it is may set the conditions for the trade); using ii) extreme case formulations (Pomerantz 1986), e.g. na siostrę zawsze można liczyć (you can always count on a sister); iii) referring to stable character traits, e.g. Matylda bardzo chetne by podala. (.) Ona jest taka skora do pomocy (Matylda would be very happy to pass (it to you). (.) She is so eager to help); or iv) broader categorizing of the given referent, e.g. do not build (.) do do not build do not build swastikas (when a) German guy is filming us? Sometimes, even several locus of abstraction are combined in the same turn. Can we identify language-specific and cross-linguistic patterns? What are the interactional consequences: enforcing a compliant behavior in the future, eliciting an apology or cognitively simplifying complex problems? From a comparative perspective, I ask whether going beyond the here-and-now while confronting others is a practice that unites speakers across languages and is thus a human cognitive strategy to display normativity. This ongoing study is based on new comparable data from four European languages from informal interaction during activities around the table (Kornfeld/Küttner/Zinken 2023; Küttner et al. in prep.). The phenomenon was coded systematically in each of the four languages as part of a larger, quantitatively oriented study with different questions (Küttner et al. submitted). In the talk, I will show exemplarily Polish and German evidence. I use the methods of Conversation Analysis (Sidnell/Stivers (eds.) 2012) and Interactional Linguistics (Imo/Lanwer 2019).
This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.
Vitaminhaltig ist gut, vitaminreich noch besser. Eine arbeitsfreie Zeit mag entspannen, eine arbeitslose kaum. Wirken solche Aussagen sinnvoll oder doch eher sinnarm?
Die Wortbildungsproduktivität von komplexen possessiven und privativen Adjektiven erscheint praktisch grenzenlos – in der Theorie werden ihr dagegen sehr wohl Grenzen gesetzt, jedoch ohne Berücksichtigung gebrauchsbasierter, empirischer Analysen. Diesem Desiderat widmet sich dieser Band, in welchem anhand konkreter Sprachdaten Forschungslücken und Widersprüche aufgedeckt und offene Fragestellungen beantwortet werden. Zudem zeigen sich neue Bedeutungsaspekte, die den Wortbildungsprodukten bislang nicht zugeschrieben wurden. In Gänze erbringen die Analysen den nötigen Beweis, dass die korpuslinguistischen Untersuchungen bisherige morphologische Beschreibungen sowohl erweitern als auch korrigieren können und sich darüber hinaus zum Entwickeln neuer Modelle mit neuen Kategorien eignen. Die eigens für diese Zwecke korpusgestützt generierte Stichwortliste findet sich samt Anzahl an Belegtreffern im Anhang wieder.
Recent years have seen a growing interest in linguistic phenomena that challenge the received division of labour between lexicon and grammar, and hence often fall through the cracks of traditional dictionaries and grammars. Such phenomena call for novel, pattern based types of linguistic reference works (see various papers in Herbst 2019). The present paper introduces one such resource: MAP (“Musterbank argumentmarkierender Präpositionen”), a web based corpus linguistic patternbank of prepositional argument structure constructions in German. The paper gives an overview of the design and functionality of the MAP prototype currently developed at the Leibniz Institute for the German Language in Mannheim. We give a brief account of the data and our analytic workflow, illustrate the descriptions that make up the resource and sketch available options for querying it for specific lexical, semantic and structural properties of the data.
Recent years have seen a growing interest in linguistic phenomena that challenge the received division of labour between lexicon and grammar, and hence often fall through the cracks of traditional dictionaries and grammars. Such phenomena call for novel, pattern-based types of linguistic reference works (see various papers in Herbst 2019). The present paper introduces one such resource: MAP (“Musterbank argumentmarkierender Präpositionen”), a web-based corpus-linguistic patternbank of prepositional argument structure constructions in German. The paper gives an overview of the design and functionality of the MAP-prototype currently developed at the Leibniz-Institute for the German Language in Mannheim. We give a brief account of the data and our analytic workflow, illustrate the descriptions that make up the resource and sketch available options for querying it for specific lexical, semantic and structural properties of the data.
This paper presents the decisions behind the design of a maths dictionary for primary school children. We are aware that there has been a considerable problem regarding Mexican children’s performance in maths dragging on for a long time, and far from getting better, it is getting worse. One of the probable causes seems to be the lack of coordination between maths textbooks and teaching methods. Most maths textbooks used in primary schools include lots of activities and problem-solving techniques, but hardly any conceptual information in the form of definitions or explanations. Consequently, many children learn to do things, but have difficulty understanding mathematical concepts and applying them in different contexts. To help solve this problem, at least partially, the project of the dictionary was launched aiming at helping children to grasp and understand maths concepts learned during those first six years of their formal education. The dictionary is a corpus-based terminographical product whose macrostructure, microstructure, typography, and additional information were specifically designed to help children understand mathematical concepts.
Word Families in Diachrony. An epoch-spanning structure for the word families of older German
(2022)
The ‘Word Families in Diachrony’ project (WoDia), for which a funding application to the DFG is in preparation, aims to provide a database driven online research environment that will enable processes of change in the entire historical vocabulary of German to be investigated by focusing on the changes in word families and the individual means of word formation. WoDia will embed the vocabularies of Old High German (OHG), Middle High German (MHG), Old Saxon (OS), and Middle Low German (MLG) in a database, resulting in a word-family structure for High and Low German from the beginnings up to the 15th century (for High German) and up to the 17th century (for Low German). The basis of the vocabulary is provided by reference dictionaries of the four historical varieties, whereas the word families’ historical structure is based on the word-family dictionary of OHG by Jochen Splett (1992). Each lemma in the database will be assigned, where appropriate, to a word family. The individual word-formation elements and the word-formation hierarchy will be mapped in a structural formula. The etymologically corresponding lemmas and word families of the different periods/varieties of older German will be linked so that an analysis across the varieties will also be possible. The annotations of word families in the database (e. g., relating to word structure) will be supplemented by linking their lemmas to the online dictionaries and to the reference corpora of Old German (OS and OHG), MHG, and MLG.
The aim of this paper is to show how lexicographical choices reflect ideological thinking, singled out by Eagleton (2007) into the strategies of rationalizing, legitimating, action orienting, unifying, naturalizing and universalizing. It will be carried out by examining two twenty first century editions of each of the five English monolingual learner’s dictionaries published by Cambridge, Collins, Longman, Macmillan, and Oxford. The synchronic and diachronic analyses of the dictionaries and their different editions at the macro structural level (the wordlists) and at the micro structural level (the definitional styles) will show how the reduction and change of data, derived from heterogeneous social and cultural contexts of language use, to abstract essential forms, involves decisions about the central and peripheral aspects of the lexicon and the meaning of words.
Contrastive analysis of climate-related neologisms registered in GermanN and French Wikipedia
(2023)
Neologisms represent new social norms, tendencies, controversies and attitudes. They denote new or changed concepts which are constantly being negotiated between different members of the discourse community (Wodak 2022 and Catalano/Waugh (eds.) 2020). Neologisms help to identify new communicative patterns and narratives which illustrate different strings of discourse in everyday life. In recent years, many neologisms relating to the subject of the environment and climate have been emerging around the world mainly due to dominant discussions on climate change and the movement “Fridays for Future”. In German, for example, neologisms such as Klimakleber, klimaresilient and globaler Streik and in French neologisms such as éco-anxiété, justice climatique and écocitoyen could be observed. These neologisms occur in many domains of life, for example in politics, media and also in advertising, which means that “l’importance croissante des enjeux environnementaux dans les discours politiques, médiatiques et publicitaires” (Balnat/Gérard 2022, p. 22) can be identified. However, it is not only the occurrence of environment- or climate-related topics that is increasing, but also the rising polarisation of the public debate. The polarisation within public discourse is based on the fact that there are opposing positions which are represented by new or recently relevant terms such as activistes du climat (or Klimaaktivisten) and climatosceptiques (or Klimaskeptiker) (Balnat/Gérard 2022, p. 22). Due to different identifications with one or the other side, one can also speak of an “affrontement idéologique” (Balnat/Gérard 2022, p. 23). 1 The explosive nature and the high complexity of the debate on climate and the environmental issues mean that many words are naturally unfamiliar to people. This is especially true with regard to neologisms. In addition, it is often not only the new word itself but also the signified concept that is initially unknown. When people then look up words, they often do so on the Internet. Wikipedia as a “free encyclopedia” (Wikipedia 2023) is particularly well suited as an object of study with regard to neologisms, since factual knowledge is given special attention there. Furthermore, this reference guide is perceived as a regular source of agreed and common knowledge on all sorts of subjects. Hence, the descriptions found here represent social agreement on controversial terms and discussions to some degree. In this paper, German and French neologisms from the subject area of climate and environment will be examined primarily in Wikipedia, but also in the neighbouring resource Wiktionary,2 which is “a collaborative project to produce a free-content multilingual dictionary” (Wiktionary 2023). Since Wikipedia and Wiktionary are available in French and in German, 21010. International Contrastive Linguistics Conference (ICLC) both are equally suitable for the contrastive analysis. Thus, Wikipedia articles which are accessible in both languages (e.g. Klimanotstand and État d›urgence climatique) or Wikipedia articles about similar events and phenomena (e.g. Letzte Generation and Dernière Rénovation) will be compared. For example, we will have a closer look at other new terms specifying different thematic aspects of the discourse of climate and environment. We will mainly refer to those lexical items which can be found in the respective articles in both languages. Special emphasis will be on overlaps and differences, thematic foci, speaker’s positions and evaluative terms.
The paper presents the process of developing the AirFrame database, a specialized lexical resource in which aviation terminology is defined in the form of semantic frames, following the methodology of the Berkeley FrameNet (FN). First, the structure of the database is presented, and then the methodology applied in developing and populating the database is described. The link between specialized aviation frames and general language semantic frames, of which frames defining entities, processes, attributes and events are particularly relevant, is discussed on the example of the semantic frame of Flight and its related frames. The paper ends with discussing possibilities of using AirFrame as a model for further developing resources in which general and specialized knowledge are linked.
Many European languages have undergone considerable changes in orthography over the last 150 years. This hampers the application of modern computer-based analysers to older text, and hence computer-based annotation and studies of text collections spanning a long period. As a step towards a functional analyser for Norwegian texts (Nynorsk standard) from the 19th century, funding was granted in 2020 for creating a full form generator for all inflected forms of headwords found in Ivar Aasen’s dictionary published in 1873 (Aasen 1873) and his grammar from 1864 (Aasen 1864). Creating this word bank led to new insight in Aasen (1873), its structure, internal organisation, and ambition level as well as its link to Aasen (1864). As a test, the full form list generated from this new word bank was used to analyse the word inventory of texts by Aa. O. Vinje, written in the period 1850–1870. The Vinje texts were also analysed using a full form list of modern standard Norwegian, to study the differences in applicability and see how Vinje’s language relates to the written standard of modern Norwegian.
This presentation deals with collaborative turn-sequences (Lerner 2004), a syntactically coherent unit of talk that is jointly formulated by at least two speakers, in Czech and German everyday conversations. Based on conversation analysis (e.g., Schegloff 2007) and a multimodal approach to social interaction (e.g., Deppermann/Streeck 2018), we aim at comparing recurrent patterns and action types within co-constructional sequences in both languages. The practice of co-constructing turns-at-talk has been described for typologically different languages, especially for English (e.g., Lerner 1996, 2004), but also for languages such as Japanese (Hayashi 2003) or Finnish (Helasvuo 2004). For German, various forms and functions of co-constructions have already been investigated (e.g., Brenning 2015); for Czech, a detailed, interactionally based description is still pending (but see some initial observations in, e.g., Hoffmannová/Homoláč/Mrázková (eds.) 2019). Although the existence of co-constructions in different languages points to a cross-linguistic conversational practice, few explicitly comparative studies exist (see, e.g., Lerner/Takagi 1999, for English and Japanese). The language pair Czech-German has mainly been studied with respect to language contact and without specifically considering spoken language or complex conversational sequences (e.g., Nekula/Šichová/Valdrová 2013). Therefore, our second aim is to sketch out a first comparison of co-constructional sequences in German and Czech, thereby contributing to the growing field of comparative and cross-linguistic studies within conversation analysis (e.g., Betz et al. (eds.) 2021; Dingemanse/Enfield 2015; Sidnell (ed.) 2009). More specifically, we will present three main sequential patterns of co-constructional sequences, focusing on the type of action a second speaker carries out by completing a first speaker’s possibly incomplete turn-at-talk, and on how the initial speaker then responds to
this suggested completion (Lerner 2004). Excerpts from video recordings of Czech and German ordinary conversations will illustrate these recurrent co-constructional sequence types, i.e., offering help during word searches (see example 1 above), displaying understanding, or claiming independent knowledge. The third objective of this paper is to underline the participants’ orientation to similar interactional problems, solved by specific syntactic and/or lexical formats in Czech and German. Considering the more recent focus on the embodied dimension of co-constructional practices (e.g., Dressel 2020), we will also investigate the multimodal formatting of a started utterance as more or less “permeable” (Lerner 1996) for co-participant completion, the participants’ mutual embodied orientation, and possible embodied responses to others’ turn-completions (such as head nods or eyebrow flashes, cf. De Stefani 2021). More generally, this contribution reflects on the possibilities and challenges of a cross-linguistic comparison of complex multimodal sequences.
In foreign language teaching the use of dictionaries, especially bilingual, has always been related to the hypotheses concerning the relationship between the native language (L1) and second language acquisition method. If the bilingual dictionary was an obvious tool in the grammar-translation method, it was banned from the classroom in the direct, audiolingual and audiovisual methods. Also in the communicative method, foreign language learners are discouraged from using a dictionary. Its use should not obstruct the goals of communicatively oriented foreign language learning – a view still held by many foreign language teachers. Nevertheless, the reality has been different: Foreign language learners have always used dictionaries, even if they no longer possess a print dictionary and mainly use online resources and applications. Dictionaries and online resources will continue to play an important role in the future. In the Council of Europe’s language policy, with its emphasis on multilingualism and lifelong learning, the adequate use of reference tools as a strategic skill is highlighted. In several European countries, educational guidelines refer to the use of dictionaries in the context of media literacy, both in mother tongue and foreign language teaching. Not only is their adequate use important, but so too is the comparison, assessment and evaluation of the information presented, in order to develop Language Awareness and Language Learning Awareness. This is good news. However, does this mean that dictionaries are actually used in class? What role do dictionaries play in foreign language teaching in schools and universities? Are foreign language learners in the digital era really competent users? And how competent are their teachers? Are they familiar with the current (online) dictionary landscape? Can they support their students? After a more in-depth study of the status quo of dictionary use by foreign language learners and teachers and the gap between their needs and the reality, this contribution discusses the challenges facing lexicographers and meta-lexicographers and what educational policy measures are necessary to make their efforts worthwhile in turning foreign language learners – and their teachers – into competent users in a multilingual and digital world.
Dictionaries are often a reflection of their time; their respective (socio-)historical context influences how the meaning of certain lexical units is described. This also applies to descriptions of personal terms such as man or woman. Lexicographers have a special responsibility to comprehensively investigate current language use before describing it in the dictionary. Accordingly, contemporary academic dictionaries are usually corpus-based. However, it is important to acknowledge that language is always embedded in cultural contexts. Our case study investigates differences in the linguistic contexts of the use of man and woman, drawing from a range of language collections (in our case fiction books, popular magazines and newspapers). We explain how potential differences in corpus construction would therefore influence the “reality”1 depicted in the dictionary. In doing so, we address the far-reaching consequences that the choice of corpus-linguistic basis for an empirical dictionary has on semantic descriptions in dictionary entries.
Furthermore, we situate the case study within the context of gender-linguistic issues and discuss how lexicographic teams can engage with how dictionaries might perpetuate traditional role concepts when describing language use.
Dictionaries are often a reflection of their time; their respective (socio-)historical context influences how the meaning of certain lexical units is described. This also applies to descriptions of personal terms such as man or woman. Lexicographers have a special responsibility to comprehensively investigate current language use before describing it in the dictionary. Accordingly, contemporary academic dictionaries are usually corpus-based. However, it is important to acknowledge that language is always embedded in cultural contexts. Our case study investigates differences in the linguistic contexts of the use of man and woman, drawing from a range of language collections (in our case fiction books, popular magazines and newspapers). We explain how potential differences in corpus construction would therefore influence the “reality” depicted in the dictionary. In doing so, we address the far-reaching consequences that the choice of corpus-linguistic basis for an empirical dictionary has on semantic descriptions in dictionary entries.Furthermore, we situate the case study within the context of gender-linguistic issues and discuss how lexicographic teams can engage with how dictionaries might perpetuate traditional role concepts when describing language use.
Lexicographers working with minority languages face many challenges. When the language in question is also a sign language, circumstances specific to the visual-spatial modality have to be taken into consideration as well. In this paper, we aim to show and discuss which challenges we encounter while compiling the Digitales Wörterbuch der Deutschen Gebärdensprache (DW-DGS), the first corpus-based dictionary of German Sign Language (DGS). Some parallel the challenges minority language lexicographers of spoken languages encounter, e. g. few resources, no written tradition, and having to create one dictionary for all potential user groups, while others are specific to sign languages, e. g. representation of visual-spatial language and creating access structures for the dictionary.
„systemrelevant“ – eine sprachwissenschaftliche Betrachtung des Begriffs aus aktuellem Anlass
(2021)
Zwischen den Jahren oder Eine Zeit zwischen den Zeiten. Sprachliche Betrachtungen zur „Normalität“
(2021)
This paper deals with the lexicographic treatment of the evidently plenty and pervasive scatological vocabulary, that is vocabulary concerning the process and products of bodily excretion (especially feces), in the synchronic Early New High German Dictionary (FWB = Frühneuhochdeutsches Wörterbuch) from a dictionary user’s view. Initially, different cultural concepts of scatology by Norbert Elias, Michail Bachtin and Mary Douglas among others and the term taboo are reflected. Subsequently, selected lexical items such as words with a primary scatological meaning (e. g. drek, kot, scheisse), concealing expressions (euphemisms, periphrases, metaphors, e. g. sitzen, seine notdurft tun, bauernveiel), and certain aspects within the polysemy of the verb scheissen are discussed, the latter on the one hand referring to a physical process with uncontrollable aspects and on the other hand denoting a deliberate action and functionalized as a fighting word during the reformation. Focussing on different positions of lexicographical information within the microstructure of the FWB, the surveillance shows that in a synchronic perspective Early New High German scatological vocabulary is a heterogeneous and complex phenomenon due to speaker, context and respectively semantic and pragmatic purposes
This paper focusss on the first Slavonic-Romanian lexicons, compiled in the second half of the 17th century and their use(rs), proposing a method of investigating the manner in which lexical information available in the above corpus relates, if at all, to the vocabulary of texts from the same period. We chose to investigate their relation to an anonymous Old Testament translation made from Church Slavonic, also from the second half of the 17th century, which was supposed to be produced in the same geographical area, in the same Church Slavonic school or even by the same author as the lexicons. After applying a lemmatizer on both the Biblical text (Books of Genesis and Daniel) and the Romanian material from the lexicons, we analyse the results and double the statistical analysis with a series of case studies, focusing on some common lexemes that might be an indicator of the relatedness of the texts. Even if the analysis points out that the lexicons might not have been compiled as a tool for the translation of religious texts, it proves to be a useful method that reveals interesting data and provides the basis for more extensive approaches.
This paper presents the Lehnwortportal Deutsch, a new, freely accessible publication platform for resources on German lexical borrowings in other languages, to be launched in the second half of 2022. The system will host digital-native sources as well as existing, digitized paper dictionaries on loanwords, initially for some 15 recipient languages. All resources remain accessible as individual standalone dictionaries; in addition, data on words (etyma, loanwords etc.) together with their senses and relations to each other is represented as a cross-resource network in a graph database, with careful distinction between information present in the original sources and the curated portal network data resulting from matching and merging information on, e. g., lexical units appearing in multiple dictionaries. Special tooling is available for manually creating graphs from dictionary entries during digitization and for editing and augmenting the graph database. The user interface allows users to browse individual dictionaries, navigate through the underlying graph and ‘click together’ complex queries on borrowing constellations in the graph in an intuitive way. The web application will be available as open source.
This paper presents the Lehnwortportal Deutsch, a new, freely accessible publication platform for resources on German lexical borrowings in other languages, to be launched in the second half of 2022. The system will host digital-native sources as well as existing, digitized paper dictionaries on loanwords, initially for some 15 recipient languages. All resources remain accessible as individual standalone dictionaries; in addition, data on words (etyma, loanwords etc.) together with their senses and relations to each other is represented as a cross-resource network in a graph database, with careful distinction between information present in the original sources and the curated portal network data resulting from matching and merging information on, e. g., lexical units appearing in multiple dictionaries. Special tooling is available for manually creating graphs from dictionary entries during digitization and for editing and augmenting the graph database. The user interface allows users to browse individual dictionaries, navigate through the underlying graph and ‘click together’ complex queries on borrowing constellations in the graph in an intuitive way. The web application will be available as open source.
Heranwachsen in einem noch fremden Land: Die Studie beruht auf einer mehr als 20 Jahre umfassenden Langzeiterhebung in russlanddeutschen Familien mit insgesamt 16 Kindern. Schwerpunkt der Beobachtungen und Interviews war die jeweilige Situation der Kinder innerhalb und außerhalb der Familie. Wie veränderte sie sich aus der Sicht der Kinder und ihrer Angehörigen über die Jahre ab der Ankunft in Deutschland bis zum Übergang ins Berufsleben? Welche Bilanz ziehen die nunmehr jungen Erwachsenen ein Vierteljahrhundert nach ihrer Ankunft?
Die Autorinnen ordnen die individuellen Bilanzen in die internationale Migrations- und Integrationsforschung ein. Die deutsch-russische Zweisprachigkeit als Kern der Mehrsprachigkeit der StudienteilnehmerInnen wird in ihrer Beschaffenheit durch Diskursanalysen und deutschsprachige C-Tests beschrieben und zu den Deutschqualifikationen junger Erwachsener ohne Migrationshintergrund ins Verhältnis gesetzt. Die sprachlichen Qualifikationen erfahren so die ihnen gebührende Aufmerksamkeit. Sie sind Bedingung und Folge gesellschaftlicher Zugehörigkeit.
Vergleichbare Korpora für multilinguale kontrastive Studien. Herausforderungen und Desiderata
(2022)
This contribution aims to show the necessity of working in the development of multilingual corpora and appropriate tools for multilingual contrastive studies. We take the corpus of the lexicographical project COMBIDIGILEX as example to show, how difficultit is to build a suitable data basis to study and compare linguistic phenomena in German, Spanish and Portuguese. Despite the availability of big reference corpora for the three languages (at least for written language), it is not able to obtain a comparable data basis from, because the mentioned corpora are created according to different requirements and they are also powered by disparate information systems and analyse tools. To break the status quo, we plead for increasing research infrastructures by means of compatible language technology and sharing data.
eThis paper first attempts a state-of-the art overview of what is known about women in the history of lexicography up to the early twentieth century. It then focusses more closely on the German and German-English lexicographical traditions to 1900, examining them from three different perspectives (following Russell’s 2018 study of women in English lexicography): women as users and dedicatees of dictionaries; women as contributors to and compilers of lexicographical works; and (in a very preliminary way) women and female sexuality as represented in German/English bilingual dictionaries of the eighteenth and early nineteenth centuries. Russell (2018) was able to identify some 24 dictionaries invoking women as patrons, dedicatees or potential users before 1700, and some 150 works in English lexicography by women between 1500 and 1900, besides the contribution of hundreds of women as supporters and helpers, not least as unpaid readers and sub-editors for the Oxford English Dictionary. Equivalent research in other languages is lacking, but this paper presents some of the known examples of women as lexicographers. The evidence tends to support Russell’s finding for English, that women were more likely to find a place in lexicography outside the mainstream: sometimes in a more private sphere (like Hester Piozzi); often in bilingual lexicography (such as Margrethe Thiele, working on a Danish-French dictionary), including missionary and or colonizing activity (such as Cinie Louw in Africa, Daisy Bates in Australia); and in dialect description (Coronedi Berti in Italy, Luisa Lacal and María Moliner in Spain). Within the German-speaking context, women who participated in lexicographical work themselves are hard to identify before the late nineteenth century, though those few women who did have access to education were often engaged in language learning, including translation activity, and they were likely users of bilingual and multilingual dictionaries. Christian Ludwig’s (1706) English-German dictionary – the first of its kind – was dedicated to the Electoral Princess Sophia of Hanover. Elizabeth Weir may have been the first named female compiler of a German dictionary, with her bilingual New German Dictionary (1888). Rather better known are the cases of Agathe Lasch and Luise Pusch, who, as pioneering women in the field of German linguistics, ultimately led major lexicographical projects documenting German regional varieties in the first half of the twentieth century (Middle Low German and Hamburgish in the case of Lasch; the Hessisch Nassau dialect dictionary in the case of Berthold). In the light of existing research on gender and sexuality in the history of English lexicography (e. g. Iamartino 2010; Turton 2019), I conclude with a preliminary exploration how woman and sexuality have been represented in dictionaries of German and English, taking the words Hure and woman in bilingual German-English dictionaries of the eighteenth and nineteenth centuries as my case studies.
In this paper, we propose a controlled language for authoring technical documents and report the status of its development, while maintaining a specific focus on the Japanese automotive domain. To reduce writing variations, our controlled language not only defines approved and unapproved lexical elements but also prescribes their preferred location in a sentence. It consists of components of a) case frames, b) case elements, c) adverbial modifiers, d) sentence-ending functions, and e) connectives, which have been developed based on the thorough analyses of a large-scale text corpus of automobile repair manuals. We also present our prototype of a writing assistant tool that implements word substitution and reordering functions, incorporating the constructed controlled language.
It is a ubiquitous phenomenon of everyday interaction that participants confront their co-participants for behaviour that they assess as undesirable or in some other way untoward. In a set of video data of informal interaction from the PECII corpus (Parallel European Corpus of Informal Interaction), cases of such sanctions have been collected in English, German, Italian and Polish data. This study presents work in progress and focuses on interrogatively formatted sanctions, in particular on non-polar interrogatives. It has already been shown that interrogatives can do much more than ask questions (Huddleston 1994). They can also function as directives (Lindström et al. 2017) or, more specifically, as requests (Curl/Drew 2008), as invitations (Margutti/Galatolo 2018) or reproaches (Klattenberg 2021), among others. What makes them interesting for cross-linguistic comparison is that the four languages that are considered provide different morphological and (morpho-)syntactical ressources for the realization of interrogative phrases. For example, German provides the option of building in the modal particle denn that reveals a previous lack of clarity and obliges the co-participant(s) to deliver the missing information (Deppermann 2009). Of course, the other three languages have modal particles, too (e.g. allora in Italian or though in English), but they do not seem to convey the same semantic and interactional qualities as denn. From an interactional point of view, one could think that interrogatives are a typical and effective way of solliciting accounts, since formally they open up a conditionally relevant space for an answer or a
reaction. But as the data shows, this does not guarantee that they are actually responded to. Another relevant aspect in the context of sanctions is that the interrogative format seems to carry a certain ‚openness‘ that might be seen as a mitigating effect and thus provides an interesting point of comparison with other mitigating devices. This study uses the methods of conversation analysis and interactional linguistics. It is based on a collection of 148 interrogative sanctions (out of which 84 are non-polar interrogatives) covering the four languages. I draw on coded data from roughly 1000 cases to get a first overall idea of how the interrogative format might differ from other formats, and how it might interrelate with specific features – for example, if subsequently an account is delivered. Going more into depth, the interrogative sanctions will then be analyzed with respect to their formal design (e.g. polar questions vs. content questions vs. tag questions, Rossano 2010; Hayano 2013) and to their pragmatic implications. I also analyze reactions to such sanctions – both formally (cf. Enfield et al. 2019, 279) and, again, from an interactional perspective (e.g. acceptance/compliance vs. challenging/defiance; Kent 2012; Cekaite 2020). A more detailed zooming in on the sequential unfolding of some particularly interesting
instances of sanctioning interrogatives will make the picture complete.
Lexical data API
(2022)
This API provides data from various dictionary resources of K Dictionaries across 50 languages. It is used by language service providers, app developers, and researchers, and returns data as JSON documents. A basic search result consists of an object containing partial lexical information on entries that match the search criteria, but further in-depth information is also available. Basic search parameters include the source resource, source language, and text (lemma), and the entries are returned as objects within the results array. It is possible to look for words with specific syntactic criteria, specifying the part of speech, grammatical number, gender and subcategorization, monosemous or polysemous entries. When searching by parameters, each entry result contains a unique entry ID, and each sense has its own unique sense ID. Using these IDs, it is possible to obtain more data – such as syntactic and semantic information, multiword expressions, examples of usage, translations, etc. – of a single entry or sense. The software demonstration includes a brief overview of the API with practical examples of its operation.
In this paper, we present LexMeta, a metadata model for the description of human-readable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexicographical works. The scope of the proposed model, however, is broader, aiming at the exchange of metadata with catalogues of Language Resources and Technologies and addressing a wider community of researchers besides lexicographers. For the definition of the LexMeta core classes and properties, we deploy widely used RDF vocabularies, mainly Meta-Share, a metadata model for Language Resources and Technologies, and FRBR, a model for bibliographic records.
Interactants who encounter co-participant conduct which they find to be socio-normatively problematic or troublesome are faced with a range of choices. First and foremost, this includes the issue of whether to directly address it, or to simply ‘let it pass’ (at least for now) (Emerson/Messinger 1977). In the case of the former, the issue then becomes how to address it. Across the various ways in which participants can pragmatically engage with what they perceive to be transgressive or untoward behavior (e.g., Pomerantz 1978; Schegloff 1988b; Dersley/Wootton 2000; Günthner 2000; Bolden/Robinson 2011; Potter/Hepburn 2020; see also Rodriguez 2022), they sometimes meta-pragmatically formulate the co-participant’s doings in terms of specific actions. Such action descriptions are necessarily selective (Sacks 1963; Schegloff 1972, 1988a; Sidnell/Barnes 2013): They foreground certain aspects of the co-participant’s conduct, while backgrounding others, and thus contribute to publically construeing the formulated conduct in particular ways (Jayyusi 1993), viz. as socio-normatively problematic, transgressive or untoward, and interactionally accountable (Robinson 2016; Sidnell 2017).
This manual introduces a conversation analytically informed coding scheme for episodes involving the direct social sanctioning of problem behavior in informal social interaction which was developed in the project Norms, Rules, and Morality across Languages (NoRM-aL) at the Leibniz-Institute for the German Language. It outlines the background for its development, delimits the phenomena to which the coding scheme can be applied and provides instructions for its use.
The scheme asks for basic information about the recording and the participants involved in the episode, before taking stock of different features of the sanctioning episode as a whole. This is followed by sets of specific coding questions about the sanctioning move itself (such as its timing and composition) and the reaction it engenders. The coding enables researchers to get a bird’s eye view on recurrent features of such episodes in larger quantities of data and allows for comparisons across different languages and informal settings.
Learning from students. On the design and usability of an e-dictionary of mathematical graph theory
(2022)
We created a prototype of an electronic dictionary for the mathematical domain of graph theory. We evaluate our prototype and compare its effectiveness in task-based tests with that of Wikipedia. Our dictionary is based on a corpus; the terms and their definitions were automatically extracted and annotated by experts (cf. Kruse/Heid 2020). The dictionary is bilingual, covering German and English; it gives equivalents, definitions and semantically related terms. For the implementation of the dictionary, we used LexO (Bellandi et al. 2017). The target group of the dictionary are students of mathematics who attend lectures in German and work with English resources. We carried out tests to understand which items the students search for when they work on graph-theoretical tasks. We ran the same test twice, with comparable student groups, either allowing Wikipedia as an information source or our dictionary. The dictionary seems to be especially helpful for students who already have a vague idea of a term because they can use the resource to check if their idea is right.
This paper discusses an investigation of how senses are ordered across eight dictionaries. A dataset of 75 words was used for this purpose, and two senses were examined for each word. The words are divided into three groups of 25 words each according to the relationship between the senses: Homonymy, Metaphor, and Systematic Polysemy. The primary finding is that WordNet differs from the other dictionaries in terms of Metaphor. The order of the senses was more often figurative/literal, and it had the highest percentage of figurative senses that were not found. We discuss leveraging another dictionary, COBUILD, to re-order the senses according to frequency.
This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene.
Mensch-Maschine-Interaktion im lexikographischen Prozess zu lexikalischen Informationssystemen
(2022)
Dictionaries of today and tomorrow are rather digital products than print dictionaries. From the user’s perspective, electronic dictionary applications and in particular „lexical information systems“, also referred to as „digital word information systems“ are coming to the fore alongside Google searches. Given the rapid developments in the area of the automated provision of lexicographic information, more precisely the automatic creation of online dictionaries, the new role of the lexicographer in the modern lexicographic process is questionable. This article addresses this issue.
In this paper we present Trendi, a monitor corpus of written Slovene, which has been compiled recently as part of the SLED (Monitor corpus and related resources) project. The methodology and the contents of the corpus are presented, as well as the findings of the survey that aimed to identify the needs of potential users related to topical language use. The Trendi corpus currently contains news articles and other web content from 110 different sources, with the texts being collected and linguistically annotated on a daily basis. The corpus complements Gigafida 2.0, a 1.13-billion-word reference corpus of standard written Slovene. Also discussed are the ways in which the corpus will be integrated into various lexicographic projects, helping not only in the identification of neologisms but also in monitoring changes in already identified language phenomena.
In this presentation I show first results from an ongoing study about syntactic complexity of sanctioning turns in spoken language. This study is part of a larger project on sanctioning of misconduct in social interaction in different European languages (English, German, Italian and Polish). For the study I use video recordings of different everyday settings (family breakfasts, board game interactions and car rides) with three or four participants. These data come from the Parallel European Corpus of Informal Interaction (Kornfeld/Küttner/Zinken 2023; Küttner et al. submitted). I focus on sanctioning turns with more than one turn-constructional unit (see among others for TCUs: Sacks/Schegloff/Jefferson 1974; Clayman 2013). The study asks how often TCUs are linked to each other in the different languages, for what function, and how language diversity enters into this. Note that complex sanctioning turns do not always come as complex sentences.
A central goal of linguistics is to understand the diverse ways in which human language can be organized (Gibson et al. 2019; Lupyan/Dale 2016). In our contribution, we present results of a large scale cross-linguistic analysis of the statistical structure of written language (Koplenig/Wolfer/Meyer 2023) we approach this question from an information-theoretic perspective. To this end, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections, so-called corpora, consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of un. To this end, we have trained a language model on more than 6,500 different documents as represented in 41 parallel/multilingual corpora consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Figure 1 shows that our database covers a large variety of different text types, e.g. religious texts, legalese texts, subtitles for various movies and talks, newspaper texts, web crawls, Wikipedia articles, or translated example sentences from a free collaborative online database. Furthermore, we use word frequency information from the Crúbadán project that aims at creating text corpora for a large number of (especially under-resourced) languages (Scannell 2007). We statistically infer the entropy rate of each language model as an information-theoretic index of (un)predictability/complexity (Schürmann/Grassberger 1996; Takahira/Tanaka-Ishii/Dębowski 2016). Equipped with this database and information-theoretic estimation framework, we first evaluate the so-called ‘equi-complexity hypothesis’, the idea that all languages are equally complex (Sampson 2009). We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. This constitutes evidence against the equi-complexity hypothesis from an information-theoretic perspective. We then present, discuss and evaluate evidence for a complexity-efficiency trade-off that unexpectedly emerged when we analysed our database: high-entropy languages tend to need fewer symbols to encode messages and vice versa. Given that, from an information theoretic point of view, the message length quantifies efficiency – the shorter the encoded message the higher the efficiency (Gibson et al. 2019) – this indicates that human languages trade off efficiency against complexity. More explicitly, a higher average amount of choice/uncertainty per produced/received symbol is compensated by a shorter average message length. Finally, we present results that could point toward the idea that the absolute amount of information in parallel texts is invariant across different languages.
Technische Innovationen, historische Ereignisse, sich wandelnde gesellschaftliche Gegebenheiten
oder politische Neuerungen – für eine funktionierende Verständigung muss sich
der Wortschatz ständig anpassen. Da kann es schnell passieren, dass man ein Wort hört oder
liest, das man noch nicht kennt oder bei dem man sich unsicher ist, wie man es schreibt oder
spricht. Und beim Nachschlagen in einem Wörterbuch, das neue Wörter verzeichnet, stellen
sich weitere Fragen: Welche Quellen werden für ein solches Neologismenwörterbuch ausgewertet,
wie kommt ein Wort dort hinein, und ab wann gilt es als gut integriert? Welche
Typen von Neologismen gibt es eigentlich?
Vorwort
(2021)
Maske oder Mundschutz?
(2021)
Von Gummistiefelmomenten
(2021)
Corona- und andere Partys
(2021)
Shutdown, Lockdown und Exit
(2021)
Not only professional lexicographers, but also people without a professional background in lexicography, have reacted to the increased need for information on new words or medical and epidemiological terms being used in the context of the COVID-19 pandemic. In this study, corona-related glossaries published on German news websites are presented, as well as different kinds of responses from professional lexicography. They are compared in terms of the amount of encyclopaedic information given and the methods of definition used. In this context, answers to corona-related words from a German questionanswer platform are also presented and analyzed. Overall, these different reactions to a unique challenge shed light on the importance of lexicography for society and vice versa.
Not only professional lexicographers, but also people without a professional background in lexicography, have reacted to the increased need for information on new words or medical and epidemiological terms being used in the context of the COVID-19 pandemic. In this study, corona-related glossaries published on German news websites are presented, as well as different kinds of responses from professional lexicography. They are compared in terms of the amount of encyclopaedic information given and the methods of definition used. In this context, answers to corona-related words from a German questionanswer platform are also presented and analyzed. Overall, these different reactions to a unique challenge shed light on the importance of lexicography for society and vice versa.
Im Beitrag werden die Ergebnisse einer im Jahr 2015 durchgeführten Online-Umfrage vorgestellt, in der die Angemessenheit von Aussprachevarianten des Deutschen in formellen Sprechsituationen bewertet werden sollte. Zu diesem Zweck wurden den 1.964 Teilnehmer/-innen Aussprachevarianten von insgesamt 207 Lexemen vorgelegt, vor allem aus den Bereichen Wortakzent, Vokalquantität und Fremdwortrealisierung. Die Umfrageergebnisse werden tabellarisch aufgeführt und damit weiterer Forschung zur Verfügung gestellt.
Politische Grenzen haben nachweislich sowohl auf den Sprachgebrauch als auch auf die Sprachwahrnehmung einen großen Einfluss. Die vorliegende Arbeit analysiert für den die Länder Deutschland, Österreich und Italien übergreifenden bairischen Sprachraum, wie Sprecher/Hörer diesen räumlich (horizontal-areal) sowie hinsichtlich seines Verhaltensspektrums (vertikal-sozial) gliedern. Dabei werden die Wahrnehmungen sprachlicher und außersprachlicher Merkmale und die Einstellungen dazu genauer betrachtet.
Mithilfe eines pluridimensionalen Erhebungssettings, bestehend aus Tiefeninterview, Online-Fragebogen, Mental-Map-Erhebung und Hörerurteilstest, kann gezeigt werden, dass extralinguistische Barrieren, wie etwa politische Grenzen, stark mit attitudinal-perzeptiven Grenzen korrelieren. Damit stellt im Bewusstsein der Befragten auch die Staatsgrenze zwischen Deutschland und Österreich eine Sprachgrenze dar.
This paper examines a certain subset of the vocabulary of Modern Icelandic, namely those words that are labelled as ‘ancient’ in the Dictionary of Contemporary Icelandic (DCI). The words were analysed and grouped into two main categories, 1) Words with only ‘ancient’ sense(s) and 2) words that have modern as well as an obsolete older sense. Several subgroups were identified as well as some lexical characteristics. The words in question were then analysed in two other sources, the Dictionary of Old Norse Prose (ONP) and the Icelandic Gigaword Corpus (IGC). The results show that the words belong to several semantic domains that reflect the types of texts that have survived until modern times. Most of the words are robustly attested in Old Norse sources, although there are a few exceptions. Large majority of the words can be found in Modern Icelandic texts, but to a varying degree. Limits of the corpus material makes it difficult to analyse some of the words. The result indicate that the words labelled ‘ancient’ can be divided into three main groups: a) words that are poorly attested and should perhaps not be included in the lexicographic description of Modern Icelandic; b) words that are likely to occur sometimes in Modern Icelandic; c) words that function as other inherited Old Norse words and perhaps do not require a special label or should have an additional sense in the DCI.
In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the biggest obstacle is the lack of machine readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment – OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now – OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many opensource and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are not efficient enough. This paper presents the OCR software called “SCyDia”, developed to overcome this issue. We demonstrate the organizational structure of the OCR software “SCyDia” and the first results. The “SCyDia” is a web-based software solution that relies on the open-source software “Tesseract” in the background. “SCyDia” also contains a module for semi-automatic text correction. We have already processed over 15,000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the “SCyDia” by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
This paper gives an insight into a cross-media publishing process on different stages: from a printed bilingual syntagmatic dictionary for GFL to an online learner’s dictionary of German collocations to a German learner’s dictionary portal. On the basis of an sql database specially developed for a corpus-guided dictionary of German collocations, the bilingual syntagmatic learner’s dictionary KolleX was published in 2014. The first part of the article describes this lexicographic process, focusing the most relevant aspects of the dictionary concept, e. g. dictionary type, subject matter, corpus guided data selection and microstructure. The second part introduces the first online version of KolleX from 2016 and the profound changes in the editing system – from a desktop version (2005) to a web-based editing system (2016) –, which resulted successively in a prototype of a German learner’s dictionary portal, called E-KolleX DaF (2018–). Focusing on the aspects of dynamism and integration of different resources from a learner’s perspective the paper shows the innovative features of this new online reference work. The contribution presents the solutions for the integration of new datatypes in the database of KolleX and the linking to different data in German monolingual dictionary platforms. The paper outlines the web design, functioning and technical improvements of E-KolleX DaF. The conclusions provide an outlook to the forthcoming challenges.
Das KOLLokationsLEXikon Deutsch als Fremdsprache (KOLLEX DAF) ist ein
- korpusgestütztes Kollokationswörterbuch, da es typische Wortverbindungen, sog. Kollokationen und häufige Wortkombinationen nach bestimmten Kategorien mit ihren ungarischen Äquivalenten auflistet (Stichwort mit SUBSTANTIVEN, ADJEKTIVEN, VERBEN und ADVERBIEN bzw. in KOMBINATIONEN),
- syntagmatisches Lernerwörterbuch, da es außer Kollokationen auch die Valenz der Stichwörter und die der Kollokationen und Wortkombinationen angibt, ergänzt mit pragmatischen und morphosyntaktischen Verwendungsbeschränkungen sowie ggf. mit einem Symbol für mögliche Fehlerquellen,
- benutzerfreundliches Produktionswörterbuch, da es alle deutschen Wortverbindungen in blauer Farbe und in klar strukturierten Wörterbuchartikeln mit einem Übersichtsblock zu den Bedeutungen des Stichwortes auflistet, aber auch die Sprachrezeption mit einem umfangreichen Register unterstützt.
This think-aloud study charts the use of online resources by five final-year MA students in Nordic and Literacy Studies based on the analysis of screen and audio recordings of an error-correction task. The article briefly presents some linguistic features of Norwegian Nynorsk that are not common in the context of other European languages, that is, norm optionality with regards to inflection and spelling. While performing the task, the participants were allowed to use all digital aids. This article examines their resource consultation behavior, and it makes use of Laporte/Gilquin’s (2018) annotation protocol. The following research questions are posed: What online resources are used by the students? What characterizes the use? Are online resources helpful? This study provides new insights into an as yet little explored topic within the Norwegian context. The findings demonstrate that the participants relied heavily on the official monolingual dictionary Nynorskordboka. Indeed, the dictionary was helpful in the vast majority of the searches, either resulting in error improvement or the validation of a word; that is, many of the searches considered correct words. The findings suggest severe norm insecurity and emphasize the need to improve norm knowledge and metalinguistic knowledge as prerequisites for better utilization of aids. It is also suggested to include necessary information on norm optionality and other commonly queried issues in the dictionary architecture.