Refine
Year of publication
Document Type
- Part of a Book (38)
- Article (15)
- Conference Proceeding (13)
- Book (5)
- Part of Periodical (2)
Language
- English (73) (remove)
Keywords
- Wörterbuch (73) (remove)
Publicationstate
- Veröffentlichungsversion (42)
- Zweitveröffentlichung (8)
- Postprint (7)
Reviewstate
- Peer-Review (33)
- (Verlags)-Lektorat (20)
- Peer-review (1)
Publisher
This paper presents a short insight into a new project at the "Institute for the German Language” (IDS) (Mannheim). It gives an insight into some basic ideas for a corpus-based dictionary of spoken German, which will be developed and compiled by the new project "The Lexicon of spoken German” (Lexik des gesprochenen Deutsch, LeGeDe). The work is based on the "Research and Teaching Corpus of Spoken German” (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK), which is implemented in the "Database for Spoken German” (Datenbank für Gesprochenes Deutsch, DGD). Both resources, the database and the corpus, have been developed at the IDS.
So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.
In this paper we present a new approach to lexicographical design for the description of German speech act verbs. This approach is based on an action-theoretical semantic conception. The several conditions for linguistic action provide the basis for the elaboration of the central semantic features. The systematic relationship of these features is reflected in the organization of a lexical database which allows various possibilities of access to different types of lexical information.
In the following paper we shall give an outline of the semantic framework for describing speech act verbs, i. e. verbs of communication, with the practical goal of a semantical database for a (dictionary of) synonymy of German speech act verbs which enables the user not only to find a list of synonymous verbs but also enables him to gain an insight into the semantic relations between the words.
The semantic framework is based on
(i) a set of conditions for performing speech acts as the relevant domain of reference
(ii) the introduction of a notion of situation, or better type of situation
The performative as well as the descriptive use of the verbs can be reduced to their fundamental dependency on the situations in which they are used: on the one hand with regard to the possibility of the action itself, and on the other hand with regard to the possibility of their designation. For both ways of use the relevant aspects of the situation constitute the necessary conditions.
The paper reports on a dictionary of German loanwords in the languages of the South Pacific that is compiled at the Institut für Deutsche Sprache in Mannheim. The loanwords described in this dictionary mainly result from language contact between 1884 and 1914, when the German empire was in possession of large areas of the South Pacific where overall more than 700 indigenous languages were spoken. The dictionary is designed as an electronic XML-based resource from which an internet dictionary and a printed dictionary can be derived. Its printed version is intended as an ‘inverted loanword dictionary’, that is, a dictionary that – in contrast to the usual praxis in loanword lexicography – lemmatizes the words of a source language that have been borrowed by other languages. Each of the loanwords will be described with respect to its form and meaning and the contact situation in which it was borrowed. Among the outer texts of the dictionary are (i) a list of all sources with bibliographic and archival information, (ii) a commentary on each source, (iii) a short history of the language contact with German for each target language, and perhaps (iv) facsimiles of source texts.The dictionary is supposed to (i) help to reconstruct the history of language contact of the source language, (ii) provide evidence for the cultural contact between the populations speaking the source and the target languages, (iii) enable linguistic theories about the systematic changes of the semantic, morphosyntactic, or phonological lexical properties of the source language when its words are borrowed into genetically and typologically different languages, and (iv) establish a thoroughly described case for testing typological theories of borrowing.
So far, Sepedi negations have been considered more from the point of view of lexicographical treatment. Theoretical works on Sepedi have been used for this purpose, setting as an objective a neat description of these negations in a (paper) dictionary. This paper is from a different perspective: instead of theoretical works, corpus linguistic methods are used: (1) a Sepedi corpus is examined on the basis of existing descriptions of the occurrences of a relevant verb, looking at its negated forms from a purely prescriptive point of view; (2) a "corpus-driven" strategy is employed, looking only for sequences of negation particles (or morphemes) in order to list occurring constructions, without taking into account the verbs occurring in them, apart from their endings. The approach in (2) is only intended to show a possible methodology to extend existing theories on occurring negations. We would also like to try to help lexicographers to establish a frequency-based order of entries of possible negation forms in their dictionaries by showing them the number of respective occurrences. As with all corpus linguistic work, however, we must regard corpus evidence not as representative, but as tendencies of language use that can be detected and described. This is especially true for Sepedi, for which only few and small corpora exist. This paper also describes the resources and tools used to create the necessary corpus and also how it was annotated with part of speech and lemmas. Exploring the quality of available Sepedi part-of-speech taggers concerning verbs, negation morphemes and subject concords may be a positive side result.
This paper reports on an ongoing international project of compiling a freely accessible online Dictionary of German Loans in Polish Dialects. The dictionary will be the first comprehensive lexicographic compendium of its kind, serving as a complement to existing resources on German lexical loans in the literary or standard language. The empirical results obtained in the project will shed new light on the distribution of German loanwords among different dialects, also in comparison to the well-documented situation in written Polish. The dictionary will have a strong focus on the dialectal distribution of Polish dialectal variants for a given German etymon, accessible through interactive cartographic representations and corresponding search options. The editorial process is realized with dedicated collaborative web tools. The new resource will be published as an integrated part of an online information system for German lexical borrowings in other languages, the Lehnwortportal Deutsch, and is therefore highly cross-linked with other loanword dictionaries on Polish as well as Slavic and further European languages.
This paper discusses changes in lexicographic traditions with respect to contrastive dictionary entries and dynamic, on-demand e-lexicographic descriptions. The new German online dictionary Paronyme - Dyna- misch im Kontrast is concerned with easily confused words (paronyms), such as effektivtefficient and sensibel/ sensitiv. New approaches to the empirical analysis and lexicographic presentation of words such as these are required, and this dictionary is committed to overcoming the discrepancy between traditional practice and insights from language use. As a corpus-guided reference work, it strives to adequately reflect not only authentic use in situations of actual communication, but also cognitive ideas such as conceptual structure, categorization and knowledge. Looking up easily confused lexical items requires contrastive entries where users can instantly compare meaning, contexts and reference. Adaptable access to lexicographic details and variable search options offer different foci and perspectives on linguistic information, and authentic examples reflect prototypical structures. These are essential in order to meet all the different interests of users. This paper will illustrate the contrastive structure of the new e-dictionary and demonstrate which information can be compared. It also focusses on various dynamic modes of dictionary consultation, which enable users to shift perspectives on paronyms accordingly.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of word formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DeReKo and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of wrd formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DEREKO and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
This introduction summarizes general issues combining lexicography and neology in the context of the Globalex Workshop on Lexicography and Neology series. We present each of the six papers composing this Special Issue, featuring two Slavic languages (Czech and Slovak) and two Romance ones (Brazilian Portuguese and Spanish in its European and Latin American varieties) and their diverse lexicographic research and representation, in specialized dictionaries of neologisms or general language ones, in monolingual, bilingual and multilingual lexical resources, and in print and digital dictionaries.
This paper focuses on standardological and lexicographical aspects of Coronavirus-related neologisms in Croatian. The presented results are based on corpus analysis. The initial corpus for this analysis consists of terms collected for the Glossary of Coronavirus. This corpus has been supplemented by terms we collected on the Internet and from the media. The General Croatian corpora: Croatian Web Corpus – hrWaC (cf. Ljubešić/Klubička 2016) and Croatian Language Repository (cf. Brozović Rončević/Ćavar 2008: 173–186) were also used, but since they do not include neologisms that entered the language after 2013, they could be used only to check terms in the language before that time. From October 2021, a specialized Corona corpus compiled by Štrkalj Despot and Ostroški Anić (2021) became publicly available on request. The data from these corpora are analyzed by Sketch Engine (cf. Kilgarriff et al. 2004: 105–116), a corpus query system loaded with the corpora, enabling the display of lexeme context through concordances and (differential) word sketches and the extraction of keywords (terms) and N-grams. The most common collocations are sorted into syntactic categories. For English equivalents, in addition to the sources found on the Internet, enTenTen2020 corpus was consulted. In the second part of the paper, we analyze and compare the presentation of Coronavirus terminology in the descriptive Glossary of Coronavirus and the normative Croatian Web Dictionary – Mrežnik.
Within the scope of the project "Study and dissemination of COVID-19 terminology", the study reported here aims to detect, analyse and discuss the characteristics of COVID-19 terminology, in particular the role of the adjective novo [new] in this terminology, the high recurrence of terms in the plural and the resemantization of some of the terminological units used. The present paper also discusses how these characteristics influenced the choices that have guided the creation of the proposed dictionary. This paper presents, therefore, the results of the analyses of these aspects, starting with a discussion of the relation between terminology and neology and arriving at the characteristic aspects of the macrostructural and microstructural choices about which some considerations were made.
Lexicographers working with minority languages face many challenges. When the language in question is also a sign language, circumstances specific to the visual-spatial modality have to be taken into consideration as well. In this paper, we aim to show and discuss which challenges we encounter while compiling the Digitales Wörterbuch der Deutschen Gebärdensprache (DW-DGS), the first corpus-based dictionary of German Sign Language (DGS). Some parallel the challenges minority language lexicographers of spoken languages encounter, e. g. few resources, no written tradition, and having to create one dictionary for all potential user groups, while others are specific to sign languages, e. g. representation of visual-spatial language and creating access structures for the dictionary.
Except for some recent advances in spoken language lexicography (cf. Verdonik & Sepesy Maučec 2017, Hansen & Hansen 2012, Siepmann 2015), traditional lexicographic work is mainly oriented towards the written language. In this paper, we describe a method we used to identify relevant headword candidates for a lexicographic resource for spoken language that is currently being developed at the Institute for the German Language (IDS, Mannheim). We describe the challenges of the headword selection for a dictionary of spoken language, and having made considerations regarding our headword concept, we present the corpus-based procedures that we used in order to facilitate the headword selection. After presenting the results regarding the selection of one-word lemmas, we discuss the opportunities and limitations of our approach.
This contribution presents the procedure used in the Handbuch deutscher Kommunikationsverben and in its online version Kommunikationsverben in the lexicographical internet portal OWID to divide sets of semantically similar communication verbs into ever smaller sets of ever closer synonyms. Kommunikationsverben describes the meaning of communication verbs on two levels: a lexical level, represented in the dictionary entries and by sets of lexical features, and a conceptual level, represented by different types of situations referred to by specific types of verbs. The procedure starts at the conceptual level of meaning where verbs used to refer to the same specific situation type are grouped together. At the lexical level of meaning, the sets of verbs obtained from the first step are successively divided into smaller sets on the basis of the criteria of (i) identity of lexical meaning, (ii) identity of lexical features, and (iii) identity of contexts of usage. The stepwise procedure applied is shown to result in the creation of a semantic network for communication verbs.
Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world.
In foreign language teaching the use of dictionaries, especially bilingual, has always been related to the hypotheses concerning the relationship between the native language (L1) and second language acquisition method. If the bilingual dictionary was an obvious tool in the grammar-translation method, it was banned from the classroom in the direct, audiolingual and audiovisual methods. Also in the communicative method, foreign language learners are discouraged from using a dictionary. Its use should not obstruct the goals of communicatively oriented foreign language learning – a view still held by many foreign language teachers. Nevertheless, the reality has been different: Foreign language learners have always used dictionaries, even if they no longer possess a print dictionary and mainly use online resources and applications. Dictionaries and online resources will continue to play an important role in the future. In the Council of Europe’s language policy, with its emphasis on multilingualism and lifelong learning, the adequate use of reference tools as a strategic skill is highlighted. In several European countries, educational guidelines refer to the use of dictionaries in the context of media literacy, both in mother tongue and foreign language teaching. Not only is their adequate use important, but so too is the comparison, assessment and evaluation of the information presented, in order to develop Language Awareness and Language Learning Awareness. This is good news. However, does this mean that dictionaries are actually used in class? What role do dictionaries play in foreign language teaching in schools and universities? Are foreign language learners in the digital era really competent users? And how competent are their teachers? Are they familiar with the current (online) dictionary landscape? Can they support their students? After a more in-depth study of the status quo of dictionary use by foreign language learners and teachers and the gap between their needs and the reality, this contribution discusses the challenges facing lexicographers and meta-lexicographers and what educational policy measures are necessary to make their efforts worthwhile in turning foreign language learners – and their teachers – into competent users in a multilingual and digital world.
Dictionary portals
(2013)
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
ELEXIKO is a relatively new lexicological-lexicographic project based at the Institut fiir Deutsche Sprache (IDS) in Mannheim. The project compiles a reference work that explains and documents contemporary German; it was specifically designed for online publication (www.elexiko.de). The primary and exclusive basis for lexicographic interpretation is an extensive German corpus. If one refers to elexiko as an Internet dictionary, it is purely for practical reasons, elexiko is (far) more than a dictionary in its traditional sense, although, of course, it contains descriptions of the meaning and use of a lexeme just as any traditional dictionary. It is both, a hypertext dictionary and a lexical data information system.
Most dictionaries containing phraseological information are restricted to a synchronic perspective. Diachronic information on structural, semantic, and pragmatic change over time has to be reconstructed by a time-consuming consultation of various dictionaries providing only punctual insights. In the OLdPhras, project we construct an online dictionary for diachronic phraseology in German from ca. 1650 to the present by combining dic- tionary exploration with corpus-based methods. This paper highlights some challenges we have met: How to select the interesting phrasemes, i.e., those that underwent some change? How to deal with historical cor- pora? How to include different kinds of phraseme variation? We present a semi-automatic corpus-based approach for the investigation of phraseme development. We argue for a combination of dictionary exploration and corpus-based methods to provide reliable and extensive information about the diachronic development of German phrasemes.
In many countries of the world, perspectives on gender equality and racism have changed in recent decades. One result has been more attention being devoted to traces of androcentric and racist language in society. This also affects dictionaries. In lexicography there are discussions about whether or to what extent social asymmetries are inscribed in dictionaries and if this is still acceptable. The issue of the nature of description plays an important role in this discussion. If sexist usages are often found in language use, i.e. in the corpus data on which the dictionary is based, does the dictionary also have to show them? How is this, in turn, compatible with the normative power of dictionaries? Do dictionaries contribute to the perpetuation of gender stereotypes by showcasing them under the banner of descriptive principles? And what roles do lexicographers play in this process? The article deals with these questions on the basis of individual lexicographical examples and current discussions in the lexicographic and public community.
Between January 2020 and July 2021, many new words and phrases contributed to the expansion of the German vocabulary to enable communication under the new conditions that evolved during the Covid-19 pandemic. Medical and epidemiological vocabulary was integrated into the general language to a large extent. Suddenly, some lexemes from general language were used with very high frequency, while other words were used less often than before. These processes of language change can be studied in various ways, for example, in corpus linguistics with respect to the frequency or emergence of certain words in certain types of texts (e.g. press releases vs. posts in social media), in critical discourse analysis with respect to certain participants of the discourse (e.g. vocabulary of Covid-19 pandemic deniers), or in conversation analysis (e.g. with respect to new verbal interactions in greetings and farewells). The rapid expansion of vocabulary has notably affected also lexicography as a discipline of applied linguistics.
This article will focus on the ways in which a German neologism dictionary project has chosen to capture and document lexicographic information in a timely manner. Both challenges and advantages arise from lexicographic practice “at the pulse of time”. The Neologismenwörterbuch is presented as an example that lends itself well to such a discussion because its subject (neologisms) is characterized as new, innovative, and constantly changing.
Between January 2020 and summer 2021, many new words and phrases contributed to the expansion of the German vocabulary in order to enable communication under the new conditions during the corona pandemic. This rapid expansion of vocabulary has most notably affected lexicography as a discipline of applied linguistics. General language dictionaries or terminological dictionaries have quickly reflected on how the German lexicon evolved during the corona pandemic: new entries were added, others were revised. This paper, however, focuses on the ways in which a German (specialized) neologism dictionary project, the "Neologismenwörterbuch" at the "Leibniz Institute for the German Language, Mannheim" published (online, see https://www.owid.de/docs/neo/start.jsp) has chosen to capture and document lexicographic information in a timely manner. Neologisms are (following the definition applied here) lexical units or senses/meanings which emerge in a language community over a specific period of time of language development, which diffuse, are generally accepted as language norms, and which the majority of speakers perceive as new for some time. Thus, the "Neologismenwörterbuch" used to record neologisms only retrospectively, that is after their lexicalization. As a consequence, users of the dictionary were often not able to obtain details on words that were particularly conspicuous at a particular time in a specific discourse, thus raising questions concerning their meaning, correct spelling, etc. This, however, did not imply that the lexicographers of the project had not already collected these words with some preliminary information in a list of candidates for inclusion in an internal database. Therefore, the project started to publish online an index of monitored words including lexical units that had emerged since 2011, for which only time will tell whether they will diffuse and manifest as language norms. This list format was used since April 2020 to also issue a compilation of corona-related neologisms as part of the "Neologismenwörterbuch". In October 2021, this inventory included more than 1.800 Corona-related neologisms, and still, more than 700 candidates in an internal database awaited lexicographic description and inclusion into the online index (see https://www.owid.de/docs/neo/listen/corona.jsp). In this paper many examples are presented to illustrate how new words, new senses and new uses in the context of the Covid-19 pandemic are reflected in the dictionary.
This article sketches the development of paronym dictionaries in German. These dictionaries document and describe commonly confused words which cause uncertainties because they are similar in sound, spelling and/or meaning (e.g. effektiv/effizient, sportlich/sportiv). First, an overview of existing reference guides is provided, covering different traditions. Numerous lemma lists have been collected for pedagogical purposes and there has always been an interest in the lexicological treatment of paronyms. However, only a handful of dictionaries covering commonly confused pairs and a small number of genuine paronym dictionaries have ever been compiled. I will focus on lexicographic endeavours, including Wustmann (1891), Müller (1973) and Pollmann and Wolk (2001). Secondly, I will shed light on the differences in descriptions in these dictionaries. This includes how prescriptive approaches have been replaced over time by empirical descriptive accounts and how dictionaries have moved away from restricted, static hardback editions towards dynamic e-dictionaries. Finally, an e-dictionary, “Paronyme — Dynamisch im Kontrast”, is presented with contrastive and flexible two-level consultation views. Its three key elements are its corpus-based foundation, the implementation of meta-lexicographic requirements and a consideration of users’ interests. This dictionary has implemented a user-friendly and dynamic interface and it records conventionalized patterns and preferences in authentic communication.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
This paper presents the main issues connected with the creation of a trilingual Hungarian-Italian-English dictionary of the COVID-19 pandemic using Lexonomy. My aim is not only to create a coronacorpus (in Hungarian, I propose my own corona-neologism or ‘coroneologism’: koronakorpusz) and a dictionary of equivalents, but also to understand how the different waves and phases of the COVID-19 pandemic are changing the Hungarian language, detect the Corona-, COVID-, pandemic-, virus-, mask-, quarantine-, and vaccine-related neologisms, and offer an overview of the most frequent or linguistically interesting Hungarian neologisms and multiword units related to COVID-19.
This paper presents ongoing work on a multilingual (English, French, German) lexical resource of soccer language. The first part describes how lexicographic descriptions based on frame-semantic principles are derived from a partially aligned multilingual corpus of soccer match reports. The remainder of the paper then discusses how different types of ontological knowledge are linked to this resource in order to provide an access structure to the resulting dictionary. It is argued that linking lexical resources and ontologies in such a way provides novel ways to a dictionary user of navigating a domain vocabulary
In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.
This paper presents the Lehnwortportal Deutsch, a new, freely accessible publication platform for resources on German lexical borrowings in other languages, to be launched in the second half of 2022. The system will host digital-native sources as well as existing, digitized paper dictionaries on loanwords, initially for some 15 recipient languages. All resources remain accessible as individual standalone dictionaries; in addition, data on words (etyma, loanwords etc.) together with their senses and relations to each other is represented as a cross-resource network in a graph database, with careful distinction between information present in the original sources and the curated portal network data resulting from matching and merging information on, e. g., lexical units appearing in multiple dictionaries. Special tooling is available for manually creating graphs from dictionary entries during digitization and for editing and augmenting the graph database. The user interface allows users to browse individual dictionaries, navigate through the underlying graph and ‘click together’ complex queries on borrowing constellations in the graph in an intuitive way. The web application will be available as open source.
Lexical data API
(2022)
This API provides data from various dictionary resources of K Dictionaries across 50 languages. It is used by language service providers, app developers, and researchers, and returns data as JSON documents. A basic search result consists of an object containing partial lexical information on entries that match the search criteria, but further in-depth information is also available. Basic search parameters include the source resource, source language, and text (lemma), and the entries are returned as objects within the results array. It is possible to look for words with specific syntactic criteria, specifying the part of speech, grammatical number, gender and subcategorization, monosemous or polysemous entries. When searching by parameters, each entry result contains a unique entry ID, and each sense has its own unique sense ID. Using these IDs, it is possible to obtain more data – such as syntactic and semantic information, multiword expressions, examples of usage, translations, etc. – of a single entry or sense. The software demonstration includes a brief overview of the API with practical examples of its operation.
The syntagma gel hidroalcohólico ‘hydroalcoholic gel’ or the noun hidroalcohol ‘hydroalcohol’ cannot be found in Diccionario de la lengua española (DLE) of the Real Academia Española (‘Royal Spanish Academy’) or other general reference dictionaries of the Spanish language. This is so despite the fact that, for well over a year and to this very day, we have not been able to do anything without first sanitising our hands with this product. It is one of the many neologisms that the COVID-19 pandemic has brought us, and these have become commonly used words that dictionaries should consider as candidates for future updates.
By looking at the dictionarisability of these neologisms, in this work we try to set their boundaries on the continuum along which they fall. “Dictionarisability” means, in our context, the greater or lesser interest of these unities regarding the updating of general language dictionaries. At both ends of this continuum, there are surprising nonce words, as well as neologisms that have recently lost their status as such because they have now been incorporated into the dictionary. To identify different groups on the continuum of pandemic neologisms, we take into account the criteria proposed in the current literature and, by so doing, we are able to assess the extent to which they are discriminatory. This will allow us to address the neological process and to reflect on the various stages of it, from the time a neologism is born until the moment it ceases to be one because it has been dictionarised. Before that, however, we present the framework of our study and refer to the mechanisms available for detecting neologisms in general and pandemic neologisms in particular.
Not only professional lexicographers, but also people without a professional background in lexicography, have reacted to the increased need for information on new words or medical and epidemiological terms being used in the context of the COVID-19 pandemic. In this study, corona-related glossaries published on German news websites are presented, as well as different kinds of responses from professional lexicography. They are compared in terms of the amount of encyclopaedic information given and the methods of definition used. In this context, answers to corona-related words from a German questionanswer platform are also presented and analyzed. Overall, these different reactions to a unique challenge shed light on the importance of lexicography for society and vice versa.
Not only professional lexicographers, but also people without a professional background in lexicography, have reacted to the increased need for information on new words or medical and epidemiological terms being used in the context of the COVID-19 pandemic. In this study, corona-related glossaries published on German news websites are presented, as well as different kinds of responses from professional lexicography. They are compared in terms of the amount of encyclopaedic information given and the methods of definition used. In this context, answers to corona-related words from a German questionanswer platform are also presented and analyzed. Overall, these different reactions to a unique challenge shed light on the importance of lexicography for society and vice versa.
This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units, where they come from, how they are transmitted (or differ) across languages, and how their use and meaning are reflected in dictionaries of all sorts. Recent trends in as many as ten languages are considered, including general and specialized language, monolingual as well as bilingual and printed as well as online dictionaries.
This volume of Lexicographica : Series Maior focuses on lexicographic neology and neological lexicography concerning COVID-19 neologisms, featuring papers originally presented at the third Globalex Workshop on Lexicography and Neology (GWLN 2021).
The thirteen papers in this volume focus on ten languages: one Altaic (Korean), one Finno-Ugric (Hungarian), two Germanic (English and German), four Romance (French, Italian, [Brazilian and European] Portuguese and [Pan-American and European] Spanish), and one Slavic (Croatian), as well as the Sign Language of New Zealand. Specialized dictionaries of neologisms are discussed as well as general language ones, monolingual, bilingual and multilingual lexical resources, print and electronic dictionaries. Questions regarding terminology as well as general language and standard and norm regarding COVID-19 neologisms are raised and different methods of detecting candidates in media corpora, as well as by user contributions, are discussed.
Lexicography of Language Contact: An Internet Dictionary of Words of German Origin in Tok Pisin
(2016)
The paper presents an ongoing project in the domain of lexicography of language contact, namely, the “Internet Dictionary of Words of German Origin in Tok Pisin”. The German influence onto the lexicon of the main pidgin language of Papua New Guinea has its roots in the German colonial empire, where Tok Pisin played an important role as a lingua franca in the colony of German New Guinea. Tok Pisin also served as an intermediate language for many borrowing processes; that is, German loans entered many languages in the South Pacific via Tok Pisin. The Internet Dictionary of Words of German Origin in Tok Pisin is based on all available lexicographical sources from the early 20th century up to now. These sources are systematically evaluated within our project; the results will be documented in the dictionary. The microstructure of the dictionary will be presented with respect to its major features: documentation of sources, examples for word usage, audio files, and lexicographic comment.
We present studies using the 2013 log files from the German version of Wiktionary. We investigate several lexicographically relevant variables and their effect on look-up frequency: Corpus frequency of the headword seems to have a strong effect on the number of visits to a Wiktionary entry. We then consider the question of whether polysemic words are looked up more often than monosemic ones. Here, we also have to take into account that polysemic words are more frequent in most languages. Finally, we present a technique to investigate the time-course of look-up behaviour for specific entries. We exemplify the method by investigating influences of (temporary) social relevance of specific headwords.
Online dictionary use
(2012)
The evolution of computer technologies and the introduction of the World Wide Web (WWW) have substantially changed the way scientific articles and books are published today. Besides writing for "traditional" print media, more and more authors decide to reach a larger audience and to decrease distribution time by offering their works on the internet. The electronic medium not only facilitates the spread of information, it also adds new value by extending the possibilities of knowledge retrieval. Of course the same is true for structured data collections like scientific glossaries, dictionaries or bibliographies. They particularly profit from the web when being accessible via user-friendly and effective frontends. The following chapters deal with the transformation of the Bibliography of German Grammar (“Bibliografie zur deutschen Grammatik”) from a data pool primarly used for print publishing to a relational database application offering a basis for media-independent distribution. Starting with a short description of the beginnings of the bibliography, the focus of this article lies on the explanation of our current database design as well as on the presentation of the web-based user interface.
There is an increasing number of dictionary types and lexical search-tools designed to respond to an ever-growing array of user needs. The quest for innovation, however, is not over and this is what this book shall shed light on: the identification of dictionary types that have never been developed for certain languages or for a given lexical domain, as well as typological and linguistic problems that may compromise the development of lexicographic projects.
Dictionaries have been part and parcel of literate societies for many centuries. They assist in communication, particularly across different languages, to aid in understanding, creating, and translating texts. Communication problems arise whenever a native speaker of one language comes into contact with a speaker of another language. At the same time, English has established itself as a lingua franca of international communication. This marked tendency gives lexicography of English a particular significance, as English dictionaries are used intensively and extensively by huge numbers of people worldwide.
The development of user-adapted views of lexicographic data is frequently in demand by dictionary research on electronic reference works and hypertext information systems. In the printed dictionary it has been indispensable to develop a complete dictionary relative to a user group and using situations. In contrast, for any electronic presentation of lexicographic data there are possibilities to define user-specific views of an initially user-unspecific resource. However, research on the use of dictionaries in general, still has to answer several open questions as far as this subject is concerned. This paper will firstly provide an overview of the present state of research on dictionary use with respect to electronic lexicography. Subsequently, explanations of further prerequisites for a possible user-adapted access to data are followed, as exemplified by OWID, the Online Vocabulary Information System of the Institut für Deutsche Sprache. Finally, it will be outlined what results on the subject have been accomplished so far. Also the prospects of potential user-adapted presentations of lexicographic data will be highlighted.
The aim of this work is to describe criteria used in the process of inclusion and treatment of neologisms in dictionaries of Spanish within the framework of pandemic instability. Our starting point will be data obtained by the Antenas Neológicas Network (https://www.upf.edu/web/antenas), whose representation in three different lexicographic tools will be analyzed with the purpose of identifying problems in the methodology used to dictionarize – that is, how and what words were selected to be included in dictionaries and how they were represented in their entries – neologisms during the COVID-19 pandemic (sources and corpora of analysis, selection criteria, types of definition, among other aspects). Two of them are monolingual and COVID-19 lexical units were included as part of their updates: the Antenario, a dictionary of neologisms of Spanish varieties, and the Diccionario de la Lengua Española [DLE], a dictionary of general Spanish, published by the Real Academia Española [RAE], Spanish Royal Academy). The other is a bilingual unidirectional English-Spanish dictionary first published as a glossary, Diccionario de COVID-19 EN-ES [TREMEDICA], entirely made up of neological and non-neological lexical units related to the virus and the pandemic. Thus, the target lexis was either included in existing works or makes up the whole of a new tool located in a portal together with other lexicographic tools. Unlike other collections of COVID-19 vocabulary that kept cropping up as the pandemic unfolded, all three have been designed and written according to well-established lexicographic practices.
Our working hypothesis is that the need to record and define words which were recently created impacts the criteria for inclusion and treatment of neologisms in dictionaries about Spanish, including a certain degree of overlap of some features which are traditionally thought to be specific to each type of dictionary.