420 Englisch
Refine
Year of publication
Document Type
- Part of a Book (72)
- Article (22)
- Conference Proceeding (7)
- Book (4)
Keywords
- Englisch (41)
- Deutsch (38)
- Korpus <Linguistik> (16)
- Massenmedien (13)
- Lexikographie (11)
- Wörterbuch (11)
- Mediensprache (9)
- Neologismus (8)
- Online-Wörterbuch (7)
- Syntax (7)
Publicationstate
- Veröffentlichungsversion (53)
- Postprint (8)
- Zweitveröffentlichung (4)
Reviewstate
- Peer-Review (51)
- (Verlags)-Lektorat (11)
- Verlags-Lektorat (1)
Publisher
- IDS-Verlag (38)
- Institut für Deutsche Sprache (18)
- Benjamins (5)
- Ids-Verlag (3)
- Schwann (3)
- De Gruyter (2)
- Elsevier (2)
- John Benjamins (2)
- Novus Press (2)
- Springer (2)
In diesem Artikel soll es darum gehen, neuere theoretische Arbeiten zum Lexikon für lexikographische Anwendungen nutzbar zu machen. Insbesondere möchte ich einige Ergebnisse der neueren Valenzforschung skizzieren und sie zur gängigen lexikographischen Praxis der Valenzinformation in einsprachigen Lernerwörterbüchern in Beziehung setzen. Ich werde dabei vor allem auf einzelne der Forschungsergebnisse Bezug nehmen, die in den letzten zehn Jahren in dem Wuppertaler Forschungsprojekt „Valenz im Lexikon“ im Rahmen des Sonderforschungsbereichs 282 „Theorie des Lexikons“ entstanden sind. 1 Dazu werde ich im folgenden Abschnitt einige Annahmen der multidimensionalen Valenztheorie darstellen. In Abschnitt 3 wird es um typische Lernerfehler in den einzelnen Valenzdimensionen gehen, in Abschnitt 4 um Nicht-Notwendigkeit und die Interpretation impliziter Argumente und in Abschnitt 5 um semantische Bedingungen für Valenzalternanzen.
This paper focusss on the first Slavonic-Romanian lexicons, compiled in the second half of the 17th century and their use(rs), proposing a method of investigating the manner in which lexical information available in the above corpus relates, if at all, to the vocabulary of texts from the same period. We chose to investigate their relation to an anonymous Old Testament translation made from Church Slavonic, also from the second half of the 17th century, which was supposed to be produced in the same geographical area, in the same Church Slavonic school or even by the same author as the lexicons. After applying a lemmatizer on both the Biblical text (Books of Genesis and Daniel) and the Romanian material from the lexicons, we analyse the results and double the statistical analysis with a series of case studies, focusing on some common lexemes that might be an indicator of the relatedness of the texts. Even if the analysis points out that the lexicons might not have been compiled as a tool for the translation of religious texts, it proves to be a useful method that reveals interesting data and provides the basis for more extensive approaches.
In this paper I explore the theoretical significance of phonologically conditioned gaps in word formation. The data support the original approach to gaps in Optimality Theory proposed by Prince & Smolensky (1993), which crucially involves MPARSE as a ranked and violable constraint. The alternative CONTROL model proposed by Orgun & Sprouse (1999) is found to be inadequate because of lost generalisations and technical flaws. It is shown that a careful distinction between various morphophonological effects (e.g. paradigm uniformity effects, phonological repair and ‘stem selection’) is necessary to shed light on the morphology–phonology interface. The data investigated here support affixspecific constraint rankings, but argue against any stratal organisation of morphology.
The paper presents the results of a survey on lexicographic practices and lexicographers’ needs across Europe that was conducted in the context of the Horizon 2020 project European Lexicographic Infrastructure (ELEXIS) among the observer institutions of the project. The survey is a revised and upgraded version of the survey which was originally conducted among ELEXIS lexicographic partner institutions in 2018 (Kallas et al. 2019a). The main goal of this new survey was to complement the data from the ELEXIS lexicographic partner institutions in order to get a more complete picture of lexicographic practices both for born-digital and retro-digitised resources in Europe. The results offer a detailed insight into many aspects of the lexicographic process at European institutions, such as funding, training, staff, lexicographic expertise, software and tools. In addition, the survey reflects on current trends in lexicography and reveals what institutions see as the most important emerging trends that will affect lexicography in the short-term and long-term future. Overall, the results provide valuable input informing the development of tools, resources, guidelines and training materials within ELEXIS.
This paper discusses an investigation of how senses are ordered across eight dictionaries. A dataset of 75 words was used for this purpose, and two senses were examined for each word. The words are divided into three groups of 25 words each according to the relationship between the senses: Homonymy, Metaphor, and Systematic Polysemy. The primary finding is that WordNet differs from the other dictionaries in terms of Metaphor. The order of the senses was more often figurative/literal, and it had the highest percentage of figurative senses that were not found. We discuss leveraging another dictionary, COBUILD, to re-order the senses according to frequency.
Phonesthemes (Firth 1930) are sublexical constructions that have an effect on the lexico-grammatical continuum: they are recurring form-meaning associations that occur more often than by chance but not systematically (Abramova/Fernandez/Sangati 2013). Phonesthemes have been shown (Bergen 2004) to affect psycholinguistic language processing; they organise the mental lexicon. Phonesthemes appear over time to emerge as driven by language use as indexical rather than purely iconic constructions in the lexicon (Smith 2016; Bergen 2004; Flaksman 2020). Phonesthemes are acknowledged in construction morphology (Audring/Booij/Jackendoff 2017) as motivational schemas. Some phonesthemes also tend to have lexicographic acknowledgment, as shown by etymologist Liberman (2010), although this relevance and cohesion appears to be highly variable as we will show in this paper.
Germany’s diverse history in the 20th century raises the question of how social upheavals were constituted in and through political discourse. By analysing basic concepts, the research network “The 20th century in basic concepts” (based at the Leibniz institutes IDS, ZfL, ZZF) aims to identify continuities and discontinuities in political and social discourse. In this way, historical sediments of the present are to be uncovered and those challenges identified that emerged in the course of the 20th century and continue to shape political discourse until the present.
This paper deals with different types of verbal complementation of the German verb verdienen. It focuses on constructions that have been undergoing a grammaticalization process and thus express deontic modality, as in Sie verdient geliebt zu werden (ʽShe deserves to be lovedʼ) and Sie verdient zu leben (ʽShe deserves to liveʼ) (Diewald, Dekalo & Czicza 2021). These constructions are connected to parallel complementation types with passive and active infinitives containing a correlate es, as in Sie verdient es, geliebt zu werden and Sie verdient es, zu leben, as well as finite clauses with the subordinator dass with and without correlative es, as in Sie verdient, dass sie geliebt wird and Sie verdient es, dass sie geliebt wird. This paper attempts to show a close comparative investigation of these six types of constructions based on their relevant semantic and syntactic properties in terms of clause linkage (Lehmann 1988). We analyze the relevant data retrieved from the DWDS corpus of the 20th century and present an expanded grammaticalization path for verdienen-constructions. The finite complementation with dass is regarded as an example of a separate structural option called “elaboration”. Concerning the use of correlative es, it is shown that it does not have any substantial effect on the grammaticalization of modal verdienen-constructions.
This paper investigates evidence for linguistic coherence in new urban dialects that evolved in multiethnic and multilingual urban neighbourhoods. We propose a view of coherence as an interpretation of empirical observations rather than something that would be ‘‘out there in the data’’, and argue that this interpretation should be based on evidence of systematic links between linguistic phenomena, as established by patterns of covariation between phenomena that can be shown to be related at linguistic levels. In a case study, we present results from qualitative and quantitative analyses for a set of phenomena that have been described for Kiezdeutsch, a new dialect from multilingual urban Germany. Qualitative analyses point to linguistic relationships between different phenomena and between pragmatic and linguistic levels. Quantitative analyses, based on corpus data from KiDKo (www.kiezdeutschkorpus.de), point to systematic advantages for the Kiezdeutsch data from a multiethnic and multilingual context provided by the main corpus (KiDKo/Mu), compared to complementary corpus data from a mostly monoethnic and monolingual (German) context (KiDKo/Mo). Taken together, this indicates patterns of covariation that support an interpretation of coherence for this new dialect: our findings point to an interconnected linguistic system, rather than to a mere accumulation of individual features. In addition to this internal coherence, the data also points to external coherence: Kiezdeutsch is not disconnected on the outside either, but fully integrated within the general domain of German, an integration that defies a distinction of ‘‘autochthonous’’ and ‘‘allochthonous’’ German, not only at the level of speakers, but also at the level of linguistic systems.
In English and French relational adjectives occurring in construction with deverbal nominalizations can be thematically associated with subject as well as object arguments. By contrast, in German object-related readings of relational adjectives seem to be inadmissible. The greater flexibility of English and French in terms of the thematic interpretability of relational adjectives also shows up with respect to "circumstantial" thematic roles like directionals, locatives and instrumentals. It is arguably due to the common Latin heritage of English and French, since in Latin relational adjectives representing subject or object arguments of nominalizations are widely attested. However, even in English and French object-related readings are confined to result nominalizations, a restriction we suggest to account for in terms of the more "noun-like" character of result nominalizations in contrast to process nominalizations. Moreover, since argument-related interpretations of relational adjectives can always be overridden by appropriate agentive/ patientive phrases, relational adjectives cannot be analyzed as occupying an argument position, but rather as modifying the semantic role associated with it.
This think-aloud study charts the use of online resources by five final-year MA students in Nordic and Literacy Studies based on the analysis of screen and audio recordings of an error-correction task. The article briefly presents some linguistic features of Norwegian Nynorsk that are not common in the context of other European languages, that is, norm optionality with regards to inflection and spelling. While performing the task, the participants were allowed to use all digital aids. This article examines their resource consultation behavior, and it makes use of Laporte/Gilquin’s (2018) annotation protocol. The following research questions are posed: What online resources are used by the students? What characterizes the use? Are online resources helpful? This study provides new insights into an as yet little explored topic within the Norwegian context. The findings demonstrate that the participants relied heavily on the official monolingual dictionary Nynorskordboka. Indeed, the dictionary was helpful in the vast majority of the searches, either resulting in error improvement or the validation of a word; that is, many of the searches considered correct words. The findings suggest severe norm insecurity and emphasize the need to improve norm knowledge and metalinguistic knowledge as prerequisites for better utilization of aids. It is also suggested to include necessary information on norm optionality and other commonly queried issues in the dictionary architecture.
One major issue in the accomplishment of contrasts in conversation is lexical choice of items which carry the semantic Ioad of the two states of affair which are represented as being opposed to one another. These items or expressions are co-selected to be understood as being contrastively related to each other. In this paper, it is argued that the activity of contrasting itself provides them with a specific local opposite meaning which they would not obtain in other contexts. Practices of contrastingare thus seen as an example of conversational activities which creatively and systematically affect situated meanings. Basedon data from various genres, such as meetings, mediation sessions and conversations, the paper discusses two practices of contrasting, their sequential construction and their interpretative effects. It is concluded that the interpretative effects of conversational contrasting rest on the sequential deployment oflinguistic resources and on the cognitive procedures of frame-based interpretation and constructing a maximally contrastive interpretation for the co-selected expressions.
The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.
Lexicographers working with minority languages face many challenges. When the language in question is also a sign language, circumstances specific to the visual-spatial modality have to be taken into consideration as well. In this paper, we aim to show and discuss which challenges we encounter while compiling the Digitales Wörterbuch der Deutschen Gebärdensprache (DW-DGS), the first corpus-based dictionary of German Sign Language (DGS). Some parallel the challenges minority language lexicographers of spoken languages encounter, e. g. few resources, no written tradition, and having to create one dictionary for all potential user groups, while others are specific to sign languages, e. g. representation of visual-spatial language and creating access structures for the dictionary.
This paper describes a method for automatic identification of sentences in the Gigafida corpus containing multi-word expressions (MWEs) from the list of 5,242 phraseological units, which was developed on the basis of several existing open-access lexical resources for Slovene. The method is based on a definition of MWEs, which includes information on two levels of corpus annotation: syntax (dependency parsing) and morphology (POS tagging), together with some additional statistical parameters. The resulting lexicon contains 12,358 sentences containing MWEs extracted from the corpus. The extracted sentences were analysed from the lexicographic point of view with the aim of establishing canonical forms of MWEs and semantic relations between them in terms of variation, synonymy, and antonymy.
Deutscher Wortschatz im Internet: Das Informationssystem elexiko und sein Modulprojekt Neologismen
(2007)
In this article, the language of advertising is considered as a set of persuasive strategies and corresponding communicative means which persuaders employ to communicate with the target audience and to promote the product, service, candidate, idea, etc. The arrangement of these strategies and means is determined by pragmatic and communicative goals of the advertising campaign.
We present evidence for the analysis of the vowels in English <say> and <so> as biphonemic diphthongs /ɛi/ and /əu/, based on neutralization patterns, regular alternations, and foot structure. /ɛi/ and /əu/ are hence structurally on a par with the so called “true diphthongs” /ɑi/, /ɐu/, /ɔi/, but also share prosodic organization with the monophthongs /i/ and /u/. The phonological evidence is supported by dynamic measurements based on the American English TIMIT database.
Calculations of F2-slopes proved to be especially suited to distinguish the relevant groups in accordance with their phonologically motivated prosodic organizations.
Older adults are often exposed to elderspeak, a specialized speech register linked with negative outcomes. However, previous research has mainly been conducted in nursing homes without considering multiple contextual conditions. Based on a novel contextually-driven framework, we examined elderspeak in an acute general versus geriatric German hospital setting. Individuallevel information such as cognitive impairment (CI) and audio-recorded data from care interactions between 105 older patients (M = 83.2 years; 49% with severe CI) and 34 registered nurses (M = 38.9 years) were assessed. Psycholinguistic analyses were based on manual coding (k = .85 to k = .97) and computer-assisted procedures. First, diminutives (61%), collective pronouns (70%), and tag questions (97%) were detected. Second, patients’ functional impairment emerged as an important factor for elderspeak. Our study suggests that functional impairment may be a more salient trigger of stereotype activation than CI and that elderspeak deserves more attention in acute hospital settings.
The digital environment represents a qualitatively new level of service for research work with linguistic information presented in dictionary form. And first of all, this applies to index systems. By dictionary indexing we mean a set of formalized rules and procedures, on the basis of which it is possible to obtain information about certain linguistic facts recorded in the dictionary. These rules are implemented in the form of user interfaces. However, one should take into account the fact that the effectiveness of automatic construction of index schemes for a digital dictionary is possible only in a sufficiently formalized environment. This article describes the method and technology of indexing the Etymological Dictionary of the Ukrainian Language (EDUL). For the language indexing of the dictionary, a special computer instrumental system (VLL – virtual lexicographic laboratory) was developed, and adapted to the structure of the EDUL and focused on the creation of indexes in automatic mode. The digital implementation of the EDUL made it possible to access the entire corpus of the dictionary text regardless of the time of publication of the corresponding volume and opened up opportunities for various digital interpretations of etymological information.
This paper presents two toolsets for transcribing and annotating spoken language: the EXMARaLDA system, developed at the University of Hamburg, and the FOLK tools, developed at the Institute for the German Language in Mannheim. Both systems are targeted at users interested in the analysis of spontaneous, multi-party discourse. Their main user community is situated in conversation analysis, pragmatics, sociolinguistics and related fields. The paper gives an overview of the individual tools of the two systems – the Partitur-Editor, a tool for multi-level annotation of audio or video recordings, the Corpus Manager, a tool for creating and administering corpus metadata, EXAKT, a query and analysis tool for spoken language corpora, FOLKER, a transcription editor optimized for speed and efficiency of transcription, and OrthoNormal, a tool for orthographical normalization of transcription data. It concludes with some thoughts about the integration of these tools into the larger tool landscape.
This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene.
This article examines the contrasts and commonalities between languages for specific purposes (LSP) and their popularizations on the one hand and the frequency patterns of LSP register features in English and German on the other. For this purpose corpora of expertexpert and expert-lay communication are annotated for part-of-speech and phrase structure information. On this basis, the frequencies of pre- and post-modifications in complex noun phrases are statistically investigated and compared for English and German. Moreover, using parallel and comparable corpora it is tested whether English-German translations obey the register norms of the target language or whether the LSP frequency patterns of the source language Ñshine throughì. The results provide an empirical insight into language contact phenomena involving specialized communication.
Freezing in it-clefts
(2013)
This paper aims at verifying if the most important online Brazilian Portuguese dictionaries include some of the neologisms identified in texts published in the 1990s to 2000s, formed with the elements ciber-, e-, bio-, eco- and narco, which we refer to as fractomorphemes / fracto-morphèmes. Three online dictionaries were analyzed (Aulete, Houaiss and Michaelis), as well as Vocabulário Ortográfico da Língua Portuguesa (VOLP). We were able to conclude that all three dictionaries and VOLP include neologisms with these elements; Michaelis and VOLP do not include separate entries for bound morphemes, whereas Houaiss includes entries for all of them and Aulete includes entries for bio-, eco- and narco-. Aulete also describes the neological meaning of eco- and narco-, whereas Houaiss does not.
In the present article I have decided to focus on the analysis of one of the most "traditional", but still fast-developing and ever-changing type of advertising – on the analysis of advertising in the press. The more my colleagues, students, and I try to analyse, scrutinise and describe particular aspects of advertising, the more obvious it is that to make this analysis authentic and reliable from the theoretical point of view and important from the practical point of view, it is necessary to suggest a universal approach to the study.
This paper investigates synchronic variation in the lexical and grammatical environments of the German lexical verb verdienen ‘earn’, ‘deserve’. In its lexical uses, verdienen co-occurs with an object noun phrase whose head is either concrete (e.g. Geld ‘money’) or, more commonly, abstract (e.g. Beachtung ‘attention’). When it is used more grammatically with deontic modal meaning, verdienen is followed by a passive or active infinitive. This paper uses collostructional analyses to contrast lexical and grammatical uses in terms of the most strongly attracted lexical items, which are grouped into semantic classes. The results reflect different degrees of host-class expansion (cf. Himmelmann 2004), whereby the collexemes of verdienen expand from concrete to abstract and their morpho-syntactic contexts from nominal to infinitival complement and subsequently from passive to active. Synchronic distribution can thus serve as a window on diachronic development (Kuteva 2001), in this case the rise of a deontic modality marker.
Indefinitpronomina im weiteren Sinne sind eine Sammelklasse für alle Pronomina, die nicht auf bestimmte, eindeutig identifizierbare Gegenstände der Welt orientiert sind, also Interrogativa (wer, was), Indefinita im engeren Sinne (jemand, etwas, niemand, nichts) und Quantifikativa (alle, jeder, einige). Der interlinguale Vergleich zeigt hier Gemeinsamkeiten über die Klassen hinweg wie eine konzeptuelle Sortierung in "Person" und "Nicht-Personales", die Repräsentation der Individuativ Kontinuativ- Unterscheidung sowie die Berücksichtigung von Partitivität und Distributivität.
Adnominale Possessiva - wie sein (Fahrrad) gegenüber selbstständigem seines - stehen in paradigmatischer Opposition zu attributiven Nominalphrasen wie Evas (Fahrrad), (das Fahrrad) der kleinen Schwester. Im Deutschen handelt es sich dabei in der Regel um (einen bestimmten Subtyp der) Genitivphrasen, in anderen europäischen Sprachen häufig um Präpositionalphrasen. In der Sprachtypologie wird unter funktionaler Perspektive von 'Possessorphrasen' gesprochen, wobei ein weiter Begriff von 'Zugehörigkeit' bzw. 'referenzieller Verankerung' zugrunde zu legen ist. Verglichen mit den Possessiva der Kontrastsprachen Englisch, Französisch, Polnisch und Ungarisch gilt für das Deutsche: Im Unterschied zu den affixalen Possessiva des Ungarischen sind die Possessiva des Deutschen wie die der übrigen Kontrastsprachen freie Formen; dies entspricht der 'dependensmarkierenden' Strategie dieser Sprachen gegenüber dem 'kopfmarkierenden' Ungarischen. Im Unterschied zum Polnischen wird Reflexivität bei den deutschen Possessiva nicht berücksichtigt. Bei den Possessiva der dritten Person hat das Deutsche das vergleichsweise komplexeste System: Sie richten sich im Stamm nach dem Genus und Numerus des Antezedens, also des Posssessor-Ausdrucks (sein- versus ihr-) und in der Flexionsendung nach Kasus, Genus und Numerus des Possessum Ausdrucks; adnominal ist dies der substantivische Kopf. In den Kontrastsprachen orientieren sich diese Possessiva entweder nur am Antezedens (Englisch, Polnisch: non-reflexive Possessiva, Ungarisch) oder primär am Kopf-Substantiv, wie im Französischen oder beim reflexiven Possessivum des Polnischen.
This article advocates an understanding of ‘positioning’ as a key to the analysis of identities in interaction within the methodological framework of conversation analysis. Building on research by Bamberg, Georgakopoulou and others, a performative, interaction-based approach to positioning is outlined and compared to membership categorization analysis. An interactional episode involving mock stories to reveal and reproach an inadequate identity-claim of a co-participant is analysed both in terms of practices of membership categorization and positioning. It is concluded that membership categorization is a core element of positioning. Still, positioning goes beyond membership categorization in a) revealing biographical dimensions accomplished by narration and b) by uncovering implicit performative claims of identity, which are not established by categorization or description.
In this paper, I argue that the main questions that arise in the process of making a dictionary of political metaphors - that of identifying live conceptual metaphors in a corpus of text - may be solved on the basis of a pragmatic approach, taking into account the reflections in a text of cognitive processes in the minds of its author and its reader. Certainly, this goal cannot be attained without a further fine-grained semantic analysis o f presumably metaphoric expressions in their linguistic and cultural context.
The aim of this paper is to show how lexicographical choices reflect ideological thinking, singled out by Eagleton (2007) into the strategies of rationalizing, legitimating, action orienting, unifying, naturalizing and universalizing. It will be carried out by examining two twenty first century editions of each of the five English monolingual learner’s dictionaries published by Cambridge, Collins, Longman, Macmillan, and Oxford. The synchronic and diachronic analyses of the dictionaries and their different editions at the macro structural level (the wordlists) and at the micro structural level (the definitional styles) will show how the reduction and change of data, derived from heterogeneous social and cultural contexts of language use, to abstract essential forms, involves decisions about the central and peripheral aspects of the lexicon and the meaning of words.
We describe the status of work intending at including sign language lexical data within the OntoLex-Lemon framework. Our general goal is to provide for a multimodal extension to this framework, which was originally conceived for covering only the written and phonetic representation of lexical data. Our aim is to achieve in the longer term the same type of semantic interoperability between sign language lexical data as this is achieved for their spoken or written counterparts. We want also to achieve this goal across modalities: between sign language lexical data and spoken/written lexical data.
Theories of aspectual composltlon assume that accomplishments arise when a transitive verb has an incremental theme argument which is realized as a quantized NP-foremost, an NP which is not a mass noun or a bare plural-in direct object position. A problem confronting this assumption is the large number of intransitive, unergative verbs in Getman and English that occur in accomplishment expressions. The paper argues that this problem can be solved within a Standard theory of aspectual composition if additional, independently motivated lexical assumptions about argmnent structure, the representation of implicit arguments and lexical presuppositions are made. It turns out that a distinction between lexically detennined definitcness versus non-definiteness of implicit arguments in particular plays a cmcial role, as weil as one between implicitly reflexive and non-reflexive arguments in that implicitly definite and implicitly reflexive arguments allow for accomplishment expressions. This is explained by the semantics of definiteness and refl.exivity, respectively. Apart from these verbs, there is another large group of unergatives which show that, in contrast to a common assumption in aspectual composition theory, verbs thermselves and not only VPs can be quantized. This leads to a lexical distinction between "mass" and "count" verbs.
In this paper, we present LexMeta, a metadata model for the description of human-readable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexicographical works. The scope of the proposed model, however, is broader, aiming at the exchange of metadata with catalogues of Language Resources and Technologies and addressing a wider community of researchers besides lexicographers. For the definition of the LexMeta core classes and properties, we deploy widely used RDF vocabularies, mainly Meta-Share, a metadata model for Language Resources and Technologies, and FRBR, a model for bibliographic records.
Not only professional lexicographers, but also people without a professional background in lexicography, have reacted to the increased need for information on new words or medical and epidemiological terms being used in the context of the COVID-19 pandemic. In this study, corona-related glossaries published on German news websites are presented, as well as different kinds of responses from professional lexicography. They are compared in terms of the amount of encyclopaedic information given and the methods of definition used. In this context, answers to corona-related words from a German questionanswer platform are also presented and analyzed. Overall, these different reactions to a unique challenge shed light on the importance of lexicography for society and vice versa.
Tok Pisin is a pidgin/creole language spoken since the late 19th century in most of the area that nowadays constitutes Papua New Guinea where it emerged under German colonial rule. Unusual for a pidgin/creole, Tok Pisin is characterized by a extensive lexicographic history. The Tok Pisin Dictionary Collection at the Leibniz Institute for the German Language, described in this article, includes about fifty dictionaries. The collection forms the basis for the sketch of the history of Tok Pisin lexicography as part of colonial history presented here. The basic thesis is that in the history of Tok Pisin, lexicographic strategies, dictionary structures, and publication patterns reflect the interest (and disinterest) of various groups of colonial actors. Among these colonial actors, European scientists, Catholic missionaries, and the Australian and US militaries played important roles.
Quality journalism offers its educated readers unsimplified linguistic usage which comprises standard collocations, phrases and utterances on the one hand, and occasional word-combinations, deformed idioms and quotations on the other. The former belong to the language system and reside in a variety of unilingual dictionaries, whereas the latter are confined to speech and have little chance of being registered by lexicographers.
Recent years have seen a growing interest in linguistic phenomena that challenge the received division of labour between lexicon and grammar, and hence often fall through the cracks of traditional dictionaries and grammars. Such phenomena call for novel, pattern based types of linguistic reference works (see various papers in Herbst 2019). The present paper introduces one such resource: MAP (“Musterbank argumentmarkierender Präpositionen”), a web based corpus linguistic patternbank of prepositional argument structure constructions in German. The paper gives an overview of the design and functionality of the MAP prototype currently developed at the Leibniz Institute for the German Language in Mannheim. We give a brief account of the data and our analytic workflow, illustrate the descriptions that make up the resource and sketch available options for querying it for specific lexical, semantic and structural properties of the data.
Recent years have seen a growing interest in linguistic phenomena that challenge the received division of labour between lexicon and grammar, and hence often fall through the cracks of traditional dictionaries and grammars. Such phenomena call for novel, pattern-based types of linguistic reference works (see various papers in Herbst 2019). The present paper introduces one such resource: MAP (“Musterbank argumentmarkierender Präpositionen”), a web-based corpus-linguistic patternbank of prepositional argument structure constructions in German. The paper gives an overview of the design and functionality of the MAP-prototype currently developed at the Leibniz-Institute for the German Language in Mannheim. We give a brief account of the data and our analytic workflow, illustrate the descriptions that make up the resource and sketch available options for querying it for specific lexical, semantic and structural properties of the data.
This paper analyses one specific conversational practice of formulation
called ‘notionalization’. It consists in the transformation of a description by a prior
speaker into a categorization by the next speaker. Sequences of this kind are a
‘‘natural laboratory’’ for studying the differences between descriptions and categorizations
regarding their semantic, interactional, and rhetorical properties:
Descriptive/narrative versions are often vague and tentative, multi unit turns,
which are temporalized and episodic, offering a lot of contingent, situational,
and indexical detail.
Notionalizations turn them into condensed, abstract, timeless, and often
agentless categorizations expressed by a noun (phrase) within one turn
constructional unit (TCU).
Drawing on audio- and video-taped German data from various types of interaction,
the paper focuses on one particular practice of notionalization, the formulation
of purportedly common ground by TCUs prefaced with the connective also.
The paper discusses their turn-constructional and morphological properties, pointing
out affinities of notionalization with language for special purposes. Notionalizations
are used for reducing detail and for topical closure. They provide grounds for
emergent keywords, which can be reused to re-contextualize topical issues and
interactional histories efficiently. Notionalizations are powerful means for accomplishing
intersubjectivity while pursuing (sometimes one-sided) practical relevancies
at the same time. Their inevitably perspective design thus may lead to re-open
the issue they were deemed to settle. The paper closes with an outlook to other
practices of notionalization, pointing to dimensions of interactionally relevant
variation and commonalities.
This paper examines a certain subset of the vocabulary of Modern Icelandic, namely those words that are labelled as ‘ancient’ in the Dictionary of Contemporary Icelandic (DCI). The words were analysed and grouped into two main categories, 1) Words with only ‘ancient’ sense(s) and 2) words that have modern as well as an obsolete older sense. Several subgroups were identified as well as some lexical characteristics. The words in question were then analysed in two other sources, the Dictionary of Old Norse Prose (ONP) and the Icelandic Gigaword Corpus (IGC). The results show that the words belong to several semantic domains that reflect the types of texts that have survived until modern times. Most of the words are robustly attested in Old Norse sources, although there are a few exceptions. Large majority of the words can be found in Modern Icelandic texts, but to a varying degree. Limits of the corpus material makes it difficult to analyse some of the words. The result indicate that the words labelled ‘ancient’ can be divided into three main groups: a) words that are poorly attested and should perhaps not be included in the lexicographic description of Modern Icelandic; b) words that are likely to occur sometimes in Modern Icelandic; c) words that function as other inherited Old Norse words and perhaps do not require a special label or should have an additional sense in the DCI.
This study examines a list of 3,413 neologisms containing one or more borrowed item, which was compiled using the databases built by the Korean Neologism Investigation Project. Etymological aspects and morphological aspects are taken into consideration to show that, besides the overwhelming prevalence of English-based neologisms, particular loans from particular languages play a significant role in the prolific formation of Korean neologisms. Aspects of the lexicographic inclusion of loan-based neologisms demonstrate the need for Korean neologism and lexicography research to broaden its scopes in terms of methodology and attitudes, while also providing a glimpse of changes.
This paper focuses on the origin of the V2 property in the history of Germanic. Considering data from Gothic and Old English (OE), it is suggested that the historical core of the V2 phenomenon reduces to V-to-C movement that is triggered in operator contexts. Therefore, the historical system shares basic propertieswith limited V2 in Modern English. It is shown that apparent deviations from this pattern that can be observed in Gothic can be attributed to the influence of Greek word order. Concerning the apparently more elaborate V2 properties of OE, it is claimed that a large part of them in fact do not involve a Spec-head relation, but rather result from linear adjacency between the clause-initial element and a finite verb located in T0. Special attention is paid to the placement of pronominal subjects in OE, which are claimed to occupy SpecTP. This contrasts with a lower position of full subjects due to the absence of an EPP in OE. Finally, the loss of superficial V2 orders in the Middle English period is attributed to the development of an EPP feature in T.
The article starts by outlining the theoretical and conceptual foundations in the field of multimodal interaction analysis, which, based on its spatiallinguistic orientation, deals with the meaning of space for the constitution of social meaning. Conceptually, we refer to the ideas of architecture-forinteraction and social topography. Empirically, we look towards the entire range of visually perceptible physical expressions of the Communion participants. We also focus on the spatial prerequisites and the space-related knowledge of the visitors, which becomes evident in their situational behaviour. From our point of view, Communion is not only a ritual in worship but also a task of coordination and positioning. We analyse video excerpts of two Communions in Lutheran-Protestant worship. The central question is: How do the people who hand out the sacrament to the participants take part in the procedure themselves (self-supply)? The video excerpts are from Germany(Rimbach and Zotzenbach, South Hesse). We see self-supply as a situational reproduction of institutional structures and relevancies. Methodologically, we first analyse an example in detail, in which we elaborate constitutive aspects of self-supply and the associated implications in the sense of an arising communitisation of the faithful. The subsequent analysis is carried out from a comparative perspective with reference to the results already obtained. The analyses lead to two basic models. Firstly, we identified a two-phase model in which first the churchgoers and then separately the institution’s representatives celebrate Communion. Structurally linked to this model is the is the diverging presence of those who have already completed the ritual, divergence resulting in two ensembles with their respective interaction space. The churchgoers watch the pastor and his assistants perform the ritual themselves. Secondly, we were able to formulate an integrative model in which the pastor celebrates Communion as one of the community. This preserves cohesion among all churchgoers and there is no ritual display of the institution’s representatives as in the two-phase model. As for model-shaping factors, two aspects become particularly clear: The first are the opportunities which the architecture-forinteraction, i.e. the concrete space for the Communion, provides. The second is the number of participants who perform the ceremony under these spatial conditions. Both aspects have a direct impact on the organisation of Communion, the movement within the church space and, indirectly, on the structure and implications of self-supply.
Word-formation rules differ from syntactic rules in that they, apart from obeying morphological and semantic constraints, can also be − and often are − restricted phonologically. The present article includes an overview of the relevant phenomena in English and discusses the consequences for the representation of words in the mental lexicon and for grammar.
Dictionaries have been part and parcel of literate societies for many centuries. They assist in communication, particularly across different languages, to aid in understanding, creating, and translating texts. Communication problems arise whenever a native speaker of one language comes into contact with a speaker of another language. At the same time, English has established itself as a lingua franca of international communication. This marked tendency gives lexicography of English a particular significance, as English dictionaries are used intensively and extensively by huge numbers of people worldwide.
Inspired by GWLN 3, we take a look at the new words, meanings, and expressions that have been created during or promoted by the COVID-19 pandemic. The pandemic provides a rare opportunity to follow the rise, spread, and integration of words and expressions in a language that may serve as an illustration of how linguistic innovation in general works. Relevant words were selected from various lists, notably monthly and annual lists of prominent words attested in the corpus of The Danish Dictionary. Analysis of these lists gives an insight into the number of words that stand out month by month and what kinds of words are involved, both in terms of morphological type and of semantic category, with special attention given to neologisms. Finally, we discuss the criteria for selecting which words to include in the dictionary. With this study, Danish is added to the list of languages covered in the GWLN series on
COVID-19 neologisms.
In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the biggest obstacle is the lack of machine readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment – OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now – OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many opensource and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are not efficient enough. This paper presents the OCR software called “SCyDia”, developed to overcome this issue. We demonstrate the organizational structure of the OCR software “SCyDia” and the first results. The “SCyDia” is a web-based software solution that relies on the open-source software “Tesseract” in the background. “SCyDia” also contains a module for semi-automatic text correction. We have already processed over 15,000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the “SCyDia” by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
Given the relevance of interoperability, born-digital lexicographic resources as well as legacy retro-digitised dictionaries have been using structured formats to encode their data, following guidelines such as the Text Encoding Initiative or the newest TEI Lex-0. While this new standard is being defined in a stricter approach than the original TEI dictionary schema, its reuse of element names for several types of annotation as well as the highly detailed structure makes it difficult for lexicographers to efficiently edit resources and focus on the real content. In this paper, we present the approach designed within LeXmart to facilitate the editing of TEI Lex-0 encoded resources, guaranteeing consistency through all editing processes.
Television news discourse
(2013)
In this paper, the author develops the narrative approach to TV news discourse as follows: with the categories of the narrator, the “voices” of the narrator, points of view, the composition of narrative, and the recipient's image. A brief review of the basic peculiarities of the Russian discourse is given as an illustration.
The "imperfective-paradox" paradox and other problems with the semantics of the progressive aspect
(2000)
This paper is about the meaning of the progressive aspect, of which it has been notoriously difficult to give a satisfying account. 1 A number of intriguing properties of its meaning were first brought out in formal semantic treatments. An event semantics approach to the progressive that integrates concepts of nonnality and perspective as well as adequate lexical representations seems to be particularly promising. In section 1 I will present several problems connected with the semantics of the progressive that are crucial for shaping its truth conditions. Several solutions to these problems that have been suggested in the literature will be discussed. 2 In section 2 I will sketch a preliminary account of the meaning of the progressive aspect. In section 2.1 the basic components that underlie the truth conditions of the progressive will be described. In section 2.2 I will present underlying lexical assumptions and the truth conditions for the progressive. Finally, in section 2.3, I will evaluate the proposal by revisiting the problems discussed.
The term ‘marketing communications’ is used to denote communications by means of various persuasive messages about products, organizations, candidates and ideas that marketers send to audiences to build up knowledge of the mentioned objects, to evoke positive attitudes towards them, to stimulate the audience to act in a certain way (buy, use, vote, approve) and remain loyal to them. Possibly the most dominant type of marketing communications in our culture is advertising, but there are many other effective forms of marketing persuasion (public relations, sponsorship, point-of-sale communications, sales promotion, event marketing, product placement, etc.). Advertising uses mass media channels (traditional and new media) to contact and interact with the audiences, and thus the language of advertising has become a special form of mass media language.
There is a growing interest in pedagogical lexicography, and more specifically in the study of dictionary users’ abilities and strategies (Prichard 2008; Gavriilidou 2010, 2011; Gavriilidou/Mavrommatidou/Markos 2020; Gavriilidou/Konstantinidou 2021; Chatjipapa et al. 2020). Τhe purpose of this presentation is to investigate dictionary use strategy and the effect of an explicit and integrated dictionary awareness intervention program on upper elementary pupils’ dictionary use strategies according to gender and type of school. A total of 150 students from mainstream and intercultural schools, aged 10–12 years old, participated in the study. Data were collected before and after the intervention through the Strategy Inventory for Dictionary Use (SIDU) (Gavriilidou 2013). The results showed a significant effect of the intervention program on Dictionary Use Strategies employed by the experimental group and support the claim that increased dictionary use can be the outcome of explicit strategy instruction. In addition, the effective application of the program suggests that a direct and clear presentation of DUS is likely to be more successful than an implicit presentation. The present study contributes to the discussion concerning both the ‘teachability’ of dictionary use strategies and skills and the effective forms of intervention programs raising dictionary use awareness and culture.
In the etymological information for a word in a dictionary, the first question to be answered is whether the word is a borrowing or the result of word formation. Here, we consider this question for internationalisms ending in -ation in German and in -ácia in Slovak. In German, -ation is a suffix that attaches to verbs in -ieren. For these verbs, it is in competition with -ung. In Slovak, -ácia is a suffix that attaches to bases of Latin or Greek origin. The corresponding verbs are often backformations. Most Slovak verbs also have a nominalization in -nie. In order to investigate to what extent the nouns in -ation or -ácia are borrowings or derived from the corresponding verbs in German and Slovak, we took a random sample of English nouns in -ation for which OED gives a corresponding verb. For this sample, we checked whether the cognate noun in -ation or -ácia is attested in standard dictionaries and in corpora. Then we did the same for the corresponding verbs and the nouns in -ung or -nie. Finally, we checked the frequency of these words in DeReKo for German and SNK for Slovak. On this basis, we found evidence that -ation in German has a slightly different status to -ácia in Slovak. This status affects the relationship to the corresponding verbs and to the nouns in -ung or -nie. Such generalizations are important as background information for specifying etymological information in dictionaries, especially for languages where first attestations dates are not readily available.
This paper presents the project “The first Romanian bilingual dictionaries (17th century). Digitally annotated and aligned corpus” (eRomLex) which deals with the editing of the first bilingual Romanian dictionaries. The aim of the project is to compile an electronic corpus comprising six Slavonic-Romanian lexicons dating from the 17th century, based on their relatedness and the fact that they follow a common model in order to highlight the characteristics of this lexicographical network (the affiliations between the lexicons, the way they relate to the source, the innovations towards it, their potential uses) and to facilitate the access to their content. A digital edition allows exhaustive data extraction and comparison and link with other digitized resources for old Romanian or Church Slavonic, including dictionaries. After presenting the corpus, we point to the necessary stages in achieving this project, the techniques used to access the material and the challenges and obstacles we encountered along the way. We describe how the corpus was created, stored, indexed and can be searched over; we will also present and discuss some statistical analyses highlighting relations between the Romanian lexicons and their Slavonic-Ruthenian source.
Dictionaries are often a reflection of their time; their respective (socio-)historical context influences how the meaning of certain lexical units is described. This also applies to descriptions of personal terms such as man or woman. Lexicographers have a special responsibility to comprehensively investigate current language use before describing it in the dictionary. Accordingly, contemporary academic dictionaries are usually corpus-based. However, it is important to acknowledge that language is always embedded in cultural contexts. Our case study investigates differences in the linguistic contexts of the use of man and woman, drawing from a range of language collections (in our case fiction books, popular magazines and newspapers). We explain how potential differences in corpus construction would therefore influence the “reality”1 depicted in the dictionary. In doing so, we address the far-reaching consequences that the choice of corpus-linguistic basis for an empirical dictionary has on semantic descriptions in dictionary entries.
Furthermore, we situate the case study within the context of gender-linguistic issues and discuss how lexicographic teams can engage with how dictionaries might perpetuate traditional role concepts when describing language use.
The purpose of this paper is to present the lexicographic protocol and to report on the progress of compilation of Mikaela_Lex, which is a Greek, free online monolingual school dictionary for upper elementary students with visual impairments including 4,000 lemmata. The dictionary is equipped with new digital tools, such as the “Braille-system keyboard, a “speech-to-text” tool, a “text-to-speech” tool and also a qwerty accessibility for visually non-impaired students.
This paper presents the concept of the "participant perspective" as an approach to the study of spoken language. It discusses three aspects of this concept and shows that they can offer helpful tools in spoken language research. Employing the participant perspective provides us with an alternative to many of the approaches currently in use in the study of spoken language in that it favours small-scale, qualitative research that aims to uncover categories relevant for the participants. Its results can usefully complement large-scale studies of phenomena on all linguistic dimensions of talk.
The public as linguistic authority: Why users turn to internet forums to differentiate between words
(2022)
This paper addresses the question of why we face unsatisfactory German dictionary entries when looking up and comparing two similar lexical terms that are loan words, new words, (near)-synonyms, or confusables. It explains how users are aware of existing reference works but still search or post on language forums, often after consulting a dictionary and experiencing a range of dictionary-based problems. Firstly, these dictionary-based difficulties will be scrutinised in more detail with respect to content, function, presentation, and the language of definitions. Entries documenting loan words and commonly confused pairs from different lexical reference resources serve as examples to show the shortcomings. Secondly, I will explain why learning about your target group involves studying discussion forums. Forums are a valuable source for detailed user studies, enabling the examination of different communicative needs, concrete linguistic questions, speakers’ intuitions, and people’s reactions to posts and comments. Thirdly, with the help of two examples I will describe how the study of chats and forums had a major impact on the development of a recently compiled German dictionary of confusables. Finally, that same problem-solving approach is applied to the idea of a future dictionary of neologisms and their synonyms.
Thoughts on what kind of dictionaries and why they are necessary for journalists lead to the conclusion: first of all, dictionaries of pronunciation are interesting for them. Radio and television journalists need pronouncing dictionaries. In this regard, there are such modern dictionaries as “The Dictionary of Russian Pronunciation Difficulties” (Kalenchuk/Kasatkina 2006), “The Dictionary of Emphasis for Radio and TV announcers” (Vvedenskaja 2004) and “The Dictionary of Perfect Russian Emphasis” (Shtudiner 2007). Dictionary reference books that help to avoid some spelling mistakes are necessary in the newspaper practice. This type of publication includes “The Abridged Dictionary of Russian Language Difficulties for the Workers of the Press” (1968) that contains about 400 words, and reference books such as: “Word Usage Difficulties in TV and Broadcasting” (Gajmakova/Menkevich 1998) and “Russian Language Difficulties” by Rakhmanova (ed.) (1994).
This paper is about the meaning of the progressive aspect, which has been notoriously difficult to give a satisfying account of. A number of intriguing properties of its meaning were first brought out in formal semantic treatments. An event semantics approach to the progressive which integrates concepts of normality and perspective as well as adequate lexical representations seems to be particularly promising. In section 2 I will present several problems connected with the semantics of the progressive that are crucial for shaping its truth conditions. Several solutions to these problems that have been suggested in the literature will be discussed. In section 3 I will sketch a preliminary account of the meaning of the progressive aspect. In section 3.1 the basic components that underlie the truth conditions of the progressive will be described. In section 3.2 I will present underlying lexical assumptions and the truth conditions for the progressive. Finally, in section 4, I will evaluate the proposal by revisiting the problems discussed.
As an Introduction to the Special Issue on "Formulation, generalization,
and abstraction in interaction,’’ this paper discusses key problems of a conversation
analytic (CA) approach to semantics in interaction. Prior research in CA and
Interactional Linguistics has only rarely dealt with issues of linguistic meaning in
interaction. It is argued that this is a consequence of limitations of sequential
analysis to capture meaning in interaction. While sequential analysis remains the
encompassing methodological framework, it is suggested that it needs to be complemented
by analyzing semantic relationships between choices of formulation in
the interaction, ethnography, and structural techniques of comparing selected
options with possible alternatives. The paper describes the methodological approach
taken to interactional semantics by the papers in the Special Issue, which analyse
practices of generalization and abstraction in interaction as they are accomplished
by formulations of prior versions of reference and description.
This paper describes the TEI-based ISO standard 24624:2016 ‘Transcription of spoken language’ and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability between these formats and with rele- vant tools and services. The main idea behind the paper is that a digital infrastructure providing language resources and services to researchers should also allow the combined use of resources and/or services from different contexts. This requires syntactic and semantic interoperability. We propose a solution based on the ISO/TEI format and describe the necessary steps for this format to work as an exchange format with basic semantic interoperability for spoken language resources across the CLARIN infrastructure and beyond.
When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research purposes, however, other criteria also play a role – not least sufficient speed to process the data in an acceptable amount of time. In this paper we evaluate several state of the art tokenization tools for German – including our own – with regard to theses criteria. We conclude that while not all tools are applicable in this setting, no compromises regarding quality need to be made.
When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research purposes, however, other criteria also play a role – not least sufficient speed to process the data in an acceptable amount of time. In this paper we evaluate several state-ofthe-art tokenization tools for German – including our own – with regard to theses criteria. We conclude that while not all tools are applicable in this setting, no compromises regarding quality need to be made.
This paper looks at whether, after two decades of corpus building for the Bantu languages, the time is ripe to begin using monitor corpora. As a proof-of-concept, the usefulness of a Lusoga monitor corpus for lexicographic purposes, in casu for the detection of neologisms, both in terms of new words and new meanings, is investigated and found useful.
This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.
In this paper we present Trendi, a monitor corpus of written Slovene, which has been compiled recently as part of the SLED (Monitor corpus and related resources) project. The methodology and the contents of the corpus are presented, as well as the findings of the survey that aimed to identify the needs of potential users related to topical language use. The Trendi corpus currently contains news articles and other web content from 110 different sources, with the texts being collected and linguistically annotated on a daily basis. The corpus complements Gigafida 2.0, a 1.13-billion-word reference corpus of standard written Slovene. Also discussed are the ways in which the corpus will be integrated into various lexicographic projects, helping not only in the identification of neologisms but also in monitoring changes in already identified language phenomena.
Based on German speaking data from various activity types, the range of multimodal resources used to construct turn-beginnings is reviewed. It is claimed that participants in talk-in-interaction need to deal with four tasks in order to construct a turn which precisely fits the interactional moment of its production:
1. Achieve joint orientation: The accomplishment of the socio-spatial prerequisites necessary for producing a turn which is to become part of the participants’ common ground.
2. Display uptake: Next speaker needs to display his/her understanding of the interaction so far as the backdrop on which the production of the upcoming turn is based.
3. Deal with projections from prior talk: The speaker has to deal with projections which have been established by (the) previous turn(s) with respect to the upcoming turn.
4. Project properties of turn-in-progress: The speaker needs to orient the recipient to properties of the turn s/he is about to produce.
Turn-design thus can be seen to be informed by tasks related to the multimodal, embodied, and interactive contingencies of online-construction of turns. The four tasks are ordered in terms of prior tasks providing the prerequisite for accomplishing a later task.
Basnage’s revision (1701) of Furetiere’s Dictionnaire universel is profoundly different from Furetiere’s work in several regards. One of the most noticeable features of the dictionary lies in his in- creased use of usage labels. Although Furetiere already made use of usage labels (see Rey 1990), Basnage gives them a prominent role. As he states in the preface to his edition, a dictionary that aspires to the title of “universal” should teach how to speak in a polite way (“poliment”), right (“juste”) and making use of specific terminology for each art. He specifies, lemma by lemma, the diaphasic dimension by indicating the word’s register and context of use, the diastratic one by noting the differences in the use of the language within the social strata, the diachronic evolution by indicating both archaisms and neologisms, the diame- sic aspect by highlighting the gaps between oral and written language, the diatopic one by specifying either foreign borrowings or regionalisms.
After extracting the entries containing formulas such as “ce mot est...”, “ce terme est...” and similar ones, we compare the number of entries and the type of information provided by the two lexicographers1. In this paper, we will focus on Basnage’s innovative contribution. Furthermore, we will try to identify the lexi- cographer’s sources, i. e. we will try to establish on which grammars, collections of linguistic remarks or contemporary dictionaries Basnage relies his judgements.
In our paper, we present a case study on the quality of concept relations in the manually developed terminological resource of grammis, an information system on German grammar. We assess a SKOS representation of the resource using the tool qSKOS, create a typology of the issues identified by the tool, and conduct a qualitative analysis of selected cases. We identify and discuss aspects that can motivate quality issues and uncover that ill-formed relations are frequently indicative of deeper issues in the data model. Finally, we outline how these findings can inform improvements in our resource’s data model, discussing implications for the machine readability of terminological data.
Validating the Performativity Hypothesis to Neg-Raising using corpus data: Evidence from Polish
(2021)
Semantic theories based on predicate-argument structures have always acknowledged that lexical information associated with verbs is the basic source for the rudimentary semantic structure of sentences. The central role of verbs in sentence structure has become a major insight of modern syntactic theories since the lexical turn in linguistics, too. As a result of this development there has been an increasing interest in theories on the lexical representation of verbs. This paper will briefly review prevailing theories on verb semantics (section 1), showing that they can capture only a part of the wide range of syntactic and semantic phenomena dependent on verb meaning. For several of these phenomena (section 2) it will turn out that a theory based on highly structured events is more suitable for representing verb meaning. This theory is based on the idea that verbs refer to events that consist of several subevents which are temporally related, classified according to their duration, and whose event participants are connected to some but not necessarily all subevents by semantic relations (section 3).
In this paper, I argue against the analyses of the there-construction by Moro (1997) and Hoekstra & Mulder (1990) and for an analysis in the frame of Williams (1994), Hazout (2004) from two angles. First of all, Moro and Hoekstra & Mulder do not correctly predict the behaviour of the there-construction under wh-movement; second, from a semantic point of view, the predicate in the small clause structure is the postverbal DP and not there. Alternatively, I follow the proposal by Williams (1994) in which there is the subject of predication and I will point out a direction to analyse the problematic wh-movement data within this framework.
Many European languages have undergone considerable changes in orthography over the last 150 years. This hampers the application of modern computer-based analysers to older text, and hence computer-based annotation and studies of text collections spanning a long period. As a step towards a functional analyser for Norwegian texts (Nynorsk standard) from the 19th century, funding was granted in 2020 for creating a full form generator for all inflected forms of headwords found in Ivar Aasen’s dictionary published in 1873 (Aasen 1873) and his grammar from 1864 (Aasen 1864). Creating this word bank led to new insight in Aasen (1873), its structure, internal organisation, and ambition level as well as its link to Aasen (1864). As a test, the full form list generated from this new word bank was used to analyse the word inventory of texts by Aa. O. Vinje, written in the period 1850–1870. The Vinje texts were also analysed using a full form list of modern standard Norwegian, to study the differences in applicability and see how Vinje’s language relates to the written standard of modern Norwegian.
Word Families in Diachrony. An epoch-spanning structure for the word families of older German
(2022)
The ‘Word Families in Diachrony’ project (WoDia), for which a funding application to the DFG is in preparation, aims to provide a database driven online research environment that will enable processes of change in the entire historical vocabulary of German to be investigated by focusing on the changes in word families and the individual means of word formation. WoDia will embed the vocabularies of Old High German (OHG), Middle High German (MHG), Old Saxon (OS), and Middle Low German (MLG) in a database, resulting in a word-family structure for High and Low German from the beginnings up to the 15th century (for High German) and up to the 17th century (for Low German). The basis of the vocabulary is provided by reference dictionaries of the four historical varieties, whereas the word families’ historical structure is based on the word-family dictionary of OHG by Jochen Splett (1992). Each lemma in the database will be assigned, where appropriate, to a word family. The individual word-formation elements and the word-formation hierarchy will be mapped in a structural formula. The etymologically corresponding lemmas and word families of the different periods/varieties of older German will be linked so that an analysis across the varieties will also be possible. The annotations of word families in the database (e. g., relating to word structure) will be supplemented by linking their lemmas to the online dictionaries and to the reference corpora of Old German (OS and OHG), MHG, and MLG.