Refine
Year of publication
Document Type
- Part of a Book (51)
- Article (30)
- Conference Proceeding (14)
- Book (6)
- Other (6)
- Part of Periodical (2)
Keywords
- Online-Wörterbuch (109) (remove)
Publicationstate
- Veröffentlichungsversion (74)
- Zweitveröffentlichung (14)
- Postprint (3)
Reviewstate
- (Verlags)-Lektorat (48)
- Peer-Review (38)
- Verlags-Lektorat (5)
Publisher
- de Gruyter (21)
- Narr (11)
- Institut für Deutsche Sprache (8)
- Lexical Computing CZ s.r.o. (5)
- IDS-Verlag (4)
- Leibniz-Institut für Deutsche Sprache (IDS) (4)
- Ids-Verlag (3)
- Schmidt (3)
- Buro van die WAT (2)
- Lang (2)
Ways out of the dictionary: hyperlinks to other sources in German and African online dictionaries
(2023)
This study examines a number of German and African online dictionaries to see how they make use of the possibility of linking to external sources (e.g. other dictionaries, encyclopaedias, or even corpus data). The article investigates which hyperlinks occur at which places in the word articles and how these are presented to the dictionary users. This is done against the background of metalexicographic considerations on the planning of outer features and the mediostructure in online dictionaries as well as different categorizations of hyperlinks in online reference works. The results show that retro-digitized dictionaries make virtually no use of hyperlinks to external sources. Genuine online dictionaries, on the other hand, do, but often in a form that needs improvement, since, for example, explanations of dictionary-external links are not always found in the user guide and their design is different even within a dictionary.
This study aims to establish what lexical factors make it more likely for dictionary users to consult specific articles in a dictionary using the English Wiktionary log files, which include records of user visits over the course of 6 years. Recent findings suggest that lexical frequency is a significant factor predicting look-up behavior, with the more frequent words being more likely to be consulted. Three further lexical factors are brought into focus: (1) age of acquisition; (2) lexical prevalence; and (3) degree of polysemy operationalized as the number of dictionary senses. Age of acquisition and lexical prevalence data were obtained from recent published studies and linked to the list of visited Wiktionary lemmas, whereas polysemy status was derived from Wiktionary entries themselves. Regression modeling confirms the significance of corpus frequency in explaining user interest in looking up words in the dictionary. However, the remaining three factors also make a contribution whose nature is discussed and interpreted. Knowing what makes dictionary users look up words is both theoretically interesting and practically useful to lexicographers, telling them which lexical items should be prioritized in lexicographic work.
Allusion
(2023)
Assessment
(2023)
Most broadly, an assessment is a type of social action by which an interactant expresses an evaluative stance towards someone or something (e.g., an object, an event, an action, an experience, a state of affairs, a place, a circumstance, etc.). The target of an assessment is typically called the ‘assessable’.
Retro-sequence
(2023)
Modular pivot
(2023)
A modular pivot is a type of turn-constructional pivot. It is built from syntactically entirely optional items (i.e. linguistic adjuncts) that can occur in both turn-initial and turn-final position and can therefore be used to patch a wide range of otherwise discrete turn-constructional units (TCUs) together (Clayman & Raymond 2015). A prime example of an item that lends itself to be deployed as a modular pivot are address terms (Clayman 2012).
Pivot
(2023)
The term pivot denotes an element of talk that can be understood to belong to two larger units of talk simultaneously, thereby joining them together and acting as a transitional link between them (Schegloff 1979: 275-276). Most commonly, the term is used to refer to lexico-syntactic elements that can be interpreted as ending one turn-constructional unit (TCU) while at the same time launching a next.
The Encyclopedia of Terminology for Conversation Analysis and Interactional Linguistics is an online resource for students and scholars of CA/IL, publicly available on the EMCA Wiki page. Encyclopedias and glossaries are widespread across various fields and methods, and serve as immensely valuable resources. Given the extent to which the EMCA/IL community has expanded over the years—both terminologically as well as geographically—we hope that this encyclopedia of terminology will be well received by students and practitioners of CA and IL across the globe.
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
This article sketches the development of paronym dictionaries in German. These dictionaries document and describe commonly confused words which cause uncertainties because they are similar in sound, spelling and/or meaning (e.g. effektiv/effizient, sportlich/sportiv). First, an overview of existing reference guides is provided, covering different traditions. Numerous lemma lists have been collected for pedagogical purposes and there has always been an interest in the lexicological treatment of paronyms. However, only a handful of dictionaries covering commonly confused pairs and a small number of genuine paronym dictionaries have ever been compiled. I will focus on lexicographic endeavours, including Wustmann (1891), Müller (1973) and Pollmann and Wolk (2001). Secondly, I will shed light on the differences in descriptions in these dictionaries. This includes how prescriptive approaches have been replaced over time by empirical descriptive accounts and how dictionaries have moved away from restricted, static hardback editions towards dynamic e-dictionaries. Finally, an e-dictionary, “Paronyme — Dynamisch im Kontrast”, is presented with contrastive and flexible two-level consultation views. Its three key elements are its corpus-based foundation, the implementation of meta-lexicographic requirements and a consideration of users’ interests. This dictionary has implemented a user-friendly and dynamic interface and it records conventionalized patterns and preferences in authentic communication.
This paper reports on an ongoing international project of compiling a freely accessible online Dictionary of German Loans in Polish Dialects. The dictionary will be the first comprehensive lexicographic compendium of its kind, serving as a complement to existing resources on German lexical loans in the literary or standard language. The empirical results obtained in the project will shed new light on the distribution of German loanwords among different dialects, also in comparison to the well-documented situation in written Polish. The dictionary will have a strong focus on the dialectal distribution of Polish dialectal variants for a given German etymon, accessible through interactive cartographic representations and corresponding search options. The editorial process is realized with dedicated collaborative web tools. The new resource will be published as an integrated part of an online information system for German lexical borrowings in other languages, the Lehnwortportal Deutsch, and is therefore highly cross-linked with other loanword dictionaries on Polish as well as Slavic and further European languages.
Die lexikografische Behandlung von Neologismen aus der Perspektive hispanophoner DaF-Lernender
(2019)
Anhand von einigen medialen Kommunikationsverben wie mailen oder twittern wird das lexikografische Informationsangebot zu Neologismen auf seine Adäquatheit für die fremdsprachige Produktion untersucht. Die Untersuchung erfolgt aus der Perspektive eines spanischsprachigen DaF-Lernenden. Zur Analyse werden sowohl Neologismenwörterbücher und -datenbanken für das Deutsche als auch gängige, bilinguale Online-Wörterbücher für das Sprachenpaar Spanisch–Deutsch gezogen. Die Ergebnisse der lexikografischen Untersuchung werden exemplarisch mit korpusbasierten Daten aus einer Doktorarbeit verglichen. Die Befunde zeigen den Bedarf und die Notwendigkeit auf, die lexikografische Behandlung von (verbalen) Neologismen im spanisch–deutschen Kontext zu optimieren. Dabei soll — insbesondere — die fremdsprachige Textproduktion berücksichtigt werden.
So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.
An ongoing academic and research program, the “Vocabula Grammatica” lexicon, implemented by the Centre for the Greek Language (Thessaloniki, Greece), aims at lemmatizing all the philological, grammatical, rhetorical, and metrical terms in the written texts of scholars (philologists and scholiasts) who curated the ancient Greek literature from the beginning of the Hellenistic period (4th/3rd c. BC) until the end of the Byzantine era (15th c. AD). In particular, it aspires to fill serious gaps (a) in the study of ancient Greek scholarship and (b) in the lexicography of the ancient Greek language and literature. By providing specific examples, we will highlight the typical and methodological features of the forthcoming dictionary.
Wortgeschichte digital (‘digital word history’) is a new historical dictionary of New High German, the most recent period of German reaching from approximately 1600 AD up to the present. By contrast to many historical dictionaries, Wortgeschichte digital has a narrated text – a “word history” – at the core of its entries. The motivation for choosing this format rather than traditional microstructures is
briefly outlined. Special emphasis it put on the way these word histories interact with other components of the dictionary, notably with the quotation section. As Wortgeschichte digital is an online only project, visualizations play an important role for the design of the dictionary. Two examples are presented: first, the “quotation navigator” which is relevant for the microstructure of the entries, and, second, a timeline (“Zeitstrahl”) which is part of the macrostructure as it gives access to the lemma inventory from a diachronic point of view.
In the present contribution, I investigate if and how the English and French editions of the Wiktionary collaborative dictionary can be used as a corpus for real time neology watch. This option is envisaged as a stopgap, when no satisfactory corpus is available. Wiktionary can also prove useful in addition to standard corpus analysis, to minimize the risk of overlooking new coinages and new senses. Since the collaborative dictionary’s quest for exhaustiveness makes the manual inspection of the new additions unreasonable (more than 31,000 English lemmas and 11,000 French lemmas entered the nomenclature in 2020), identifying the possibly relevant headwords is an issue. The solution proposed here is to use Wiktionary revision history to detect the (new or existing) entries that received the greatest number of modifications. The underlying hypothesis is that the most heavily edited pages can help identify the vocabulary related to “hot topics”, assuming that, in 2020, the pandemic-related vocabulary ranks high. I used two measures introduced by Lih (2004), whose aim was to estimate the quality of Wikipedia articles: the so-called rigour (number of edits per page) and diversity (number of unique contributors per page). In the present study, I propose to adapt the rigour and diversity metrics to Wiktionary in order to identify the pages that generated a particular stir, rather than to estimate the quality of the articles. I do not subscribe to the idea that – in Wiktionary – more revisions necessarily produce quality articles (more revisions often produce complete articles). I therefore adopt Lih’s notion of diversity to refer to the number of distinct contributors, but leave out the name rigour when it comes to the number of revisions. Wolfer and Müller-Spitzer (2016) used the two metrics to describe the dynamics of the German and English editions of Wiktionary. One of their findings was that the number of edits per page is correlated with corpus word frequencies. The variation in number of page edits should therefore reflect to some extent the variation of corpus word frequencies. Renouf (2013) established a relationship between the fluctuation of word frequencies in a diachronic corpus and various neological processes. In particular, she illustrated how specific events generate sudden frequency spikes for words previously unseen in the corpus. For instance, Eyjafjallajökull, the – existing – name of an Icelandic glacier, appeared in the corpus when the underlying volcano erupted in 2010 and disrupted air traffic in Europe. In order to check if the same phenomenon occurs when using Wiktionary edits instead of corpus frequencies, I manually annotated the most frequently revised entries (according to various ranking scores) with the binary tag: “related to Covid-19” (yes/no). The annotations were then used to test the ability of various configurations to detect relevant headwords from the English and French Wiktionary, namely Covid-19 neologisms and related existing words that deserve updates.
This paper focuses on standardological and lexicographical aspects of Coronavirus-related neologisms in Croatian. The presented results are based on corpus analysis. The initial corpus for this analysis consists of terms collected for the Glossary of Coronavirus. This corpus has been supplemented by terms we collected on the Internet and from the media. The General Croatian corpora: Croatian Web Corpus – hrWaC (cf. Ljubešić/Klubička 2016) and Croatian Language Repository (cf. Brozović Rončević/Ćavar 2008: 173–186) were also used, but since they do not include neologisms that entered the language after 2013, they could be used only to check terms in the language before that time. From October 2021, a specialized Corona corpus compiled by Štrkalj Despot and Ostroški Anić (2021) became publicly available on request. The data from these corpora are analyzed by Sketch Engine (cf. Kilgarriff et al. 2004: 105–116), a corpus query system loaded with the corpora, enabling the display of lexeme context through concordances and (differential) word sketches and the extraction of keywords (terms) and N-grams. The most common collocations are sorted into syntactic categories. For English equivalents, in addition to the sources found on the Internet, enTenTen2020 corpus was consulted. In the second part of the paper, we analyze and compare the presentation of Coronavirus terminology in the descriptive Glossary of Coronavirus and the normative Croatian Web Dictionary – Mrežnik.
Within the scope of the project "Study and dissemination of COVID-19 terminology", the study reported here aims to detect, analyse and discuss the characteristics of COVID-19 terminology, in particular the role of the adjective novo [new] in this terminology, the high recurrence of terms in the plural and the resemantization of some of the terminological units used. The present paper also discusses how these characteristics influenced the choices that have guided the creation of the proposed dictionary. This paper presents, therefore, the results of the analyses of these aspects, starting with a discussion of the relation between terminology and neology and arriving at the characteristic aspects of the macrostructural and microstructural choices about which some considerations were made.
While adjusting to the COVID-19 pandemic, people around the world started to talk about the “new normal” way of life, and they conveyed feelings and thoughts on the topic through social networks and traditional communication channels resorting to a set of specific linguistic strategies, such as metaphors and neologisms. The vocabulary in different domains and in everyday speech was expanded to accommodate a complex social, cultural, and professional phenomenon of changes. Therefore, this new life gave birth to a new language – the “coronaspeak”. According to Thorne (2020), the “coronaspeak” has three stages: first, it emerged in the way medical aspects were communicated in everyday language; secondly, it occurred when speakers verbalized the experiences they had undergone and “invented their own terms”; finally, this “new” way of speaking emerged in the government and authorities’ jargon, to ensure that the new rules and policies were understood, and that population adopted socially responsible behaviours.
In this paper, we will focus on the second stage, because we intend to take stock of how speakers communicate and verbalize this new way of living, particularly on social networks, for example. Alongside, we are interested in the context in which the neologism – be it a new word, a new meaning, or a new use – emerged, is used, and understood, through the observation of the occurrence of the new word(s) either on social networks or through dissemination texts (press) to confront it with the ones that Portuguese digital dictionaries have attested so far. Different criteria regarding the insertion of new units, the inclusion date, and the lexicographic description of the entries in the dictionaries will be debated.
The long road to a historical dictionary of Lower Sorbian. Towards a lexical information system
(2022)
The Sorbian Institute has been taking preparatory steps for a historical-documentary vocabulary information system for Lower Sorbian for about 10 years. To this end, the entire extant written material (16th–21st centuries) of this strongly endangered European minority language is to be systematically evaluated. An attempt made a few years ago to organise and finance the project as a long-term scientific project was not successful in the end. Therefore, it can only be advanced step by step and via some detours. The article informs about the interim status of the project, especially with respect to the creation of a reliable database.
This paper aims at verifying if the most important online Brazilian Portuguese dictionaries include some of the neologisms identified in texts published in the 1990s to 2000s, formed with the elements ciber-, e-, bio-, eco- and narco, which we refer to as fractomorphemes / fracto-morphèmes. Three online dictionaries were analyzed (Aulete, Houaiss and Michaelis), as well as Vocabulário Ortográfico da Língua Portuguesa (VOLP). We were able to conclude that all three dictionaries and VOLP include neologisms with these elements; Michaelis and VOLP do not include separate entries for bound morphemes, whereas Houaiss includes entries for all of them and Aulete includes entries for bio-, eco- and narco-. Aulete also describes the neological meaning of eco- and narco-, whereas Houaiss does not.
The digital environment represents a qualitatively new level of service for research work with linguistic information presented in dictionary form. And first of all, this applies to index systems. By dictionary indexing we mean a set of formalized rules and procedures, on the basis of which it is possible to obtain information about certain linguistic facts recorded in the dictionary. These rules are implemented in the form of user interfaces. However, one should take into account the fact that the effectiveness of automatic construction of index schemes for a digital dictionary is possible only in a sufficiently formalized environment. This article describes the method and technology of indexing the Etymological Dictionary of the Ukrainian Language (EDUL). For the language indexing of the dictionary, a special computer instrumental system (VLL – virtual lexicographic laboratory) was developed, and adapted to the structure of the EDUL and focused on the creation of indexes in automatic mode. The digital implementation of the EDUL made it possible to access the entire corpus of the dictionary text regardless of the time of publication of the corresponding volume and opened up opportunities for various digital interpretations of etymological information.
The paper describes an online German-Russian database for phraseological constructions (PhC), or syntactic idioms. It is a linguistic phenomenon representing a stable multi-word form that usually contains some auxiliary words (“anchors”) and partially opens up empty spaces (“slots”) which are filled directly in spoken language by various lexemes or combinations of lexemes (“fillers”, or “slot fillers”). Linguists from several German institutions are currently working on the database. The PhCs selected for the database have to meet special criteria. The database is a manual that combines scientific descriptions, a thesaurus and a bilingual dictionary. The database is designed as an active aid for text production in the respective foreign language; it is also a manual for language researchers and for translators. Apart from that, it can serve as a basis for extensions for other language pairs. The aim of the project is to record and to describe 300 PhC before the database is published. Our objective is to enable foreign language learners to use the syntactic idioms correctly in the texts they produce rather than create a big-sized database. The paper describes some issues related to the creation of the database, namely objectives and target groups, material and methods, microstructure of the database article and some others.
The purpose of this paper is to present the lexicographic protocol and to report on the progress of compilation of Mikaela_Lex, which is a Greek, free online monolingual school dictionary for upper elementary students with visual impairments including 4,000 lemmata. The dictionary is equipped with new digital tools, such as the “Braille-system keyboard, a “speech-to-text” tool, a “text-to-speech” tool and also a qwerty accessibility for visually non-impaired students.
This volume of Lexicographica : Series Maior focuses on lexicographic neology and neological lexicography concerning COVID-19 neologisms, featuring papers originally presented at the third Globalex Workshop on Lexicography and Neology (GWLN 2021).
The thirteen papers in this volume focus on ten languages: one Altaic (Korean), one Finno-Ugric (Hungarian), two Germanic (English and German), four Romance (French, Italian, [Brazilian and European] Portuguese and [Pan-American and European] Spanish), and one Slavic (Croatian), as well as the Sign Language of New Zealand. Specialized dictionaries of neologisms are discussed as well as general language ones, monolingual, bilingual and multilingual lexical resources, print and electronic dictionaries. Questions regarding terminology as well as general language and standard and norm regarding COVID-19 neologisms are raised and different methods of detecting candidates in media corpora, as well as by user contributions, are discussed.
This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units, where they come from, how they are transmitted (or differ) across languages, and how their use and meaning are reflected in dictionaries of all sorts. Recent trends in as many as ten languages are considered, including general and specialized language, monolingual as well as bilingual and printed as well as online dictionaries.
„Paronyme – Dynamisch im Kontrast“ ist ein neues und neuartiges Nachschlagewerk für sprachliche Zweifelsfälle und Unsicherheiten. Erstmals werden lautlich, orthografisch und/oder semantisch ähnliche Wörter (z. B. farbig-farblich, kindlich-kindisch, universal-universell, Mehrheit-Mehrzahl) korpusbasiert in ihrem aktuellen Gebrauch untersucht und dokumentiert. Nutzer*innen können sich über die Bedeutung jedes Ausdrucks in zahlreichen Angaben und Verwendungsbeispielen informieren. Dies erfolgt kontrastiv und dynamisch in selbst wählbaren Ausschnitts- oder Vergleichsansichten, im Überblick oder im Detail.
The project “Paronymwörterbuch” investigates and documents easily confused words (so-called paronyms) in German with respect to their use in public discourse as documented in a large corpus. These are, for example, antik/antiquiert/antiquarisch (antique/antiquated/antiquarian) or sportlich/sportiv (sporty/athletic). The results of this work are explanatory, contrastive entries in a new dynamic e-dictionary called “Paronyme − Dynamisch im Kontrast”. The objective of this paper is twofold. Firstly, essential new usage modalities of the new dictionary will be illustrated. As it is designed for contrastive consultation processes, the comparative structure of the entries will be elucidated and we will show how this dictionary has moved away from static to dynamic presentation by incorporating flexible consultation options. Secondly, as entries contain linguistic details which are consistently paired up with conceptual-encyclopaedic information, it is shown how this reference guide combines corpus-based methods with cognitive semantics. In this way, linguistic findings correlate better with how users conceptualise language by adequately reflecting ideas such as conceptual structure, categorisation and knowledge. Consequently, appropriate contrastive corpus tools and methods are employed. This paper also emphasises the need of semiotic approaches to the analysis of linguistic data in order to provide ostensive and cognitive-oriented lexical explanations. Such approaches are also necessary to guarantee an efficient pairwise investigation of paronyms. Advantages and disadvantages of explorative self-organising feature maps will be explained in more detail.
Im E-Wörterbuch „Paronyme – Dynamisch im Kontrast“ werden erstmals leicht verwechselbare Ausdrücke, sogenannte Paronyme (z.B. autoritär / autoritativ, speziell / spezial), in kontrastiven und dynamischen Einträgen beschrieben. Auf zwei Beschreibungsebenen verzahnt es lexikalische Angaben mit enzyklopädischen bzw. konzeptuell-orientierten Details. Korpusanalytische Auseinandersetzungen zeigen, wie stark der Gebrauch einiger Paronyme von den Beschreibungen in traditionellen Lehr- und Nachschlagewerken abweicht. Aber Korpusdaten deuten ebenso auf sprachliche Varianz und Wandel hin, die in speziellen Rubriken festgehalten werden. Neben der Vorstellung des Wörterbuches steht die Frage im Vordergrund, wie die Informationen systematisch aus den Daten gewonnen, analysiert und redaktionell ausgewertet werden, um als Bedeutungs-, Kollokations-, Konstruktions-, Referenz- und Domänenangaben jedes Stichwort so genau wie möglich beschreiben zu können.
The German e-dictionary documenting confusables Paronyme – Dynamisch im Kontrast contains lexemes which are similar in sound, spelling and/or meaning, e.g. autoritär/autoritativ, innovativ/innovatorisch. These can cause uncertainty as to their appropriate use. The monolingual guide could be easily expanded to become a multilingual platform for commonly confused items by incorporating language modules. The value of this visionary resource is manifold. Firstly, e-dictionaries of confusables have not yet been compiled for most European languages; consequently, the German resource could serve as a model of practice. Secondly, it would be able to explain the usage of false friends. Thirdly, cognates and loan word equivalents would be offered for simultaneous consultation. Fourthly, users could find out whether, for example, a German pair is semantically equivalent to a pair in another language. Finally, it would inform users about cases where a pair of semantically similar words in one language has only one lexical counterpart in another language. This paper is an appeal for visionary projects and collaborative enterprises. I will outline the dictionary’s layout and contents as shown by its contrastive entries. I will demonstrate potential additions, which would make it possible to build up a large platform for easily misused words in different languages.
Dictionary usage research views dictionaries primarily as tools for solving linguistic problems. A large proportion of dictionary use now takes place online and can thus be easily monitored using tracking technologies. Using the data gathered through tracking usage data, we hope to optimize user experiences of dictionaries and other linguistic resources. Usage statistics are also used for external evaluation of linguistic resources. In this paper, we pursue the following three questions from a quantitative perspective: (1) What new insights can we gain from collecting and analysing usage data? (2) What limitations of the data and/or the collection process do we need to be aware of? (3) How can these insights and limitations inform the development and evaluation of linguistic resources?
Die Dokumentation und Untersuchung deutscher Sprachinselvarietäten war schon immer eine der wichtigsten Aufgaben der germanistischen Sprachwissenschaft. Mittlerweile stellt sich aber immer öfter die Frage der Nachhaltigkeit der erhobenen Spachinseldaten. Insbesondere in Bezug auf die vom Sprachtod bedrohten Varietäten, wie z.B. im Fall der russlanddeutschen Dialekte aus den noch intakten Sprachinseln der ehemaligen Sowjetunion, ist es äußerst wichtig, die existierenden Audioaufnahmen systematisch und dauerhaft zu archivieren. Aber nicht nur die Archivierung, sondern auch der freie und unkomplizierte Zugang zu diesen Materialien ist ein wesentlicher Aspekt im Konzept der Nachhaltigkeit. Wie sollte dieser Zugang aber gestaltet sein und in welcher Form sollen die Daten präsentiert werden? Auf genau diese Frage ist das Projekt „Elektronisches Wörterbuch. Ein Online-Informationsangebot zu Sprache und Dialekten der Russlanddeutschen" eingegangen. In diesem Projekt wurden historische Tonaufnahmen russlanddeutscher Dialekte linguistisch aufbereitet und in Form einer strukturierten Russlanddeutschen Dialektdatenbank (RuDiDat) online veröffentlicht. Diese Datenbank ist frei verfügbar und ermöglicht die Recherche im Korpus des Russlanddeutschen. Der vorliegende Beitrag stellt die Datenbank vor und thematisiert Herausforderungen, die durch unterschiedliche Ausprägungsformen des Russlanddeutschen entstehen könnten, wenn man die im Internet freigegebenen Sprachinseldaten für vergleichende Analysen heranzieht.
In this paper, the basic assumptions are presented against the background of the development of a corpus-based method to determine suitable headword candidates for the LeGeDe-prototype (LeGeDe= Lexik des gesprochenen Deutsch), a lexicographical resource on spoken German. In a first quantitatively oriented step, potential one-word headword candidates are identified with the help of frequency class comparisons from a corpus for spoken (FOLK) and a subset from a corpus for written German (DEREKO). Qualitative analyses based on a project-specifically defined sample of data from the FOLK corpus lead to multi-word headword candidates. The results of the qualitative analyses were also compared with the results of studies from the research literature as well as (quantitative-orientated) bi- and trigram analyses. In their multi-word form, these candidates are particularly characterized by the fact that they assume a very special interactional function in the (authentic) interaction and have to be described as a whole unit. The paper explains this combined procedure, which was extracted in the LeGeDe-project for the appointment of headword candidates.
This paper presents the corpus-based lexicographical prototype that was developed within the framework of the project Lexik des gesprochenen Deutsch (=LeGeDe) as a thirdparty funded project. Research results regarding the information offered in dictionaries have shown that there is a necessity for information on spoken lexis and its interactional functions. The resulting LeGeDe-prototype is based on these needs and desiderata and is thus an innovative example for the adequate representation of spoken language in online dictionaries. It is available online since September 2019 (https://www.owid.de/legede/). In the following sections, after first focusing on the presentation of the project’s goals, the data basis, the intended end user, and the applied methods, we will illustrate the microstructure of the prototype and the information provided in a dictionary entry based on the lemma eben. Finally, we will summarize innovative aspects that are important for the implementation of such a resource.
Im Beitrag steht das LeGeDe-Drittmittelprojekt und der im Laufe der Projektzeit entwickelte korpusbasierte lexikografische Prototyp zu Besonderheiten des gesprochenen Deutsch in der Interaktion im Zentrum der Betrachtung. Die Entwicklung einer lexikografischen Ressource dieser Art knüpft an die vielfältigen Erfahrungen in der Erstellung von korpusbasierten Onlinewörterbüchern (insbesondere am Leibniz-Institut für Deutsche Sprache, Mannheim) und an aktuelle Methoden der korpusbasierten Lexikologie sowie der Interaktionsanalyse an und nimmt als multimedialer Prototyp für die korpusbasierte lexikografische Behandlung von gesprochensprachlichen Phänomenen eine innovative Position in der modernen Onlinelexikografie ein. Der Beitrag befasst sich im Abschnitt zur LeGeDe-Projektpräsentation ausführlich mit projektrelevanten Forschungsfragen, Projektzielen, der empirischen Datengrundlage und empirisch erhobenen Erwartungshaltungen an eine Ressource zum gesprochenen Deutsch. Die Darstellung der komplexen Struktur des LeGeDe-Prototyps wird mit zahlreichen Beispielen illustriert. In Verbindung mit der zentralen Information zur Makro- und Mikrostruktur und den lexikografischen Umtexten werden die vielfältigen Vernetzungs- und Zugriffsstrukturen aufgezeigt. Ergänzend zum abschließenden Fazit liefert der Beitrag in einem Ausblick umfangreiche Vorschläge für die zukünftige lexikografische Arbeit mit gesprochensprachlichen Korpusdaten.
The majority of new words in dictionaries are included following a certain period of time during which they have become more frequent in use and established morphosyntactic and orthographic features consistent with the language system they are borrowed into. In case of borrowed new words, inclusion often takes place at a transitional state of assimilation to the language system, where delayed orthographic or phonetic change cannot be ruled out and the differentiation between standard-conforming and non-standard orthographic word forms of a lemma oftentimes depends on the proximity between the writing systems of the donor and the recipient language. Following a brief overview of loan words and their lexicographical description in the Neologismenwörterbuch, a specialized online dictionary for neologisms in contemporary German, this paper presents findings of an investigative case study on dictionary entries for a neologism borrowed from a logographic language system and discusses the potential of a corpus-based description of new loan words.
In diesem Beitrag diskutieren wir einen neuen Ansatz zur Erarbeitung von Wörterbüchern, bei dem eine große Zahl Freiwilliger gemeinsam ein Online-Wörterbuch entwickelt. Wir charakterisieren zunächst die wesentlichen Eckpunkte dieses nutzergetrieben-kollaborativen Vorgehens und diskutieren den aktuellen Stand der Forschung. Wir untersuchen Struktur, Dynamik und Zusammensetzung von kollaborativ erarbeiteten Wortschätzen und diskutieren, welches Innovationspotenzial in kollaborativen Wörterbüchern steckt und ob professionell erstellte Wörterbücher durch den neuen Ansatz obsolet werden.
Der Beitrag thematisiert Eigenschaften der strukturellen, thematischen und partizipativen Dynamik kollaborativ erzeugter lexikalischer Netzwerke am Beispiel von Wiktionary. Ausgehend von einem netzwerktheoretischen Modell in Form so genannter Mehrebenennetzwerke wird Wiktionary als ein skalenfreies Lexikon beschrieben. Systeme dieser Art zeichnen sich dadurch aus, dass ihre inhaltliche Dynamik durch die zugrundeliegende Kollaborationsdynamik bestimmt wird, und zwar so, dass sich die soziale Struktur der entsprechenden inhaltlichen Struktur aufprägt. Dieser Auffassung gemäß führt die Ungleichverteilung der Aktivitäten von Lexikonproduzenten zu einer analogen Ungleichverteilung der im Lexikon dokumentierten Informationseinheiten. Der Beitrag thematisiert Grundlagen zur Beschreibung solcher Systeme ausgehend von einem Parameterraum, welcher die netzwerkanalytische Betrachtung von Wiktionary als Big-Data-Problem darstellt.
Nach einigen Überlegungen zu Wörterbüchern und Informationssystemen soll der Frage im Titel des Vortrags auf drei Ebenen nachgegangen werden: (a) aktueller Stand der allgemeinen einsprachigen Online-Wörterbücher des Deutschen; (b) die Situation der praktischen Lexikographie und der für sie zuständigen Theorien; (c) die Lage der Wörterbuchforschung an deutschen Universitäten. Dabei soll die kulturelle und gesellschaftliche Verantwortung der praktischen Lexikographie, aber auch der Metalexikographie verdeutlicht werden. Zu wenig Beachtung findet die Lexikographie als kulturelle Praxis der Dokumentation sprachlicher, kultureller und gesellschaftlicher Verhältnisse. Zu wenig Beachtung findet die Metalexikographie als Gesellschaftswissenschaft, die sich mit den Zusammenhängen von Datenermittlung, -verwendung, -interpretation und -präsentation in der Internetlexikographie befasst. Die Ausführungen werden durch Detailanalysen des Datenangebots auf Informationsportalen zur deutschen Sprache und unter Berücksichtigung ausgewählter Benutzungssituationen gestützt. Abschliesend werden Thesen zur Zukunft der Lexikographie formuliert.
In gängigen deutschen Wörterbüchern liegen für diskursrelevante Ausdrücke keine angemessenen Beschreibungsformen vor. Darauf haben bereits Strauß, Haß und Harras (1989: 10) in Brisante Wörter von Agitation bis Zeitgeist hingewiesen. Hierfür gibt es unterschiedliche Ursachen, wie beispielsweise zu sehr in der Tradition verhaftete lexikografische Methoden und Datengrundlagen; es liegt aber auch daran, dass nach wie vor häufig in der deutschen Lexikografie Aspekte des Diskurses für die Bedeutungskonstituierung bei gesellschaftspolitischen Schlüsselwörtern unberücksichtigt bleiben. Die Bedeutung konfliktträchtigen Vokabulars (z. B. Ausdrücke wie Globalisierung, Humankapital, Kollateralschaden) kann aber nicht ohne diskurssemantische Erklärungen beschrieben werden, da es in seinem Gebrauch Zeit-, Kultur und Mentalitätsgeschichte reflektiert und die Sprechergemeinschaft bezüglich ihrer Einstellung zu solchen Ausdrücken spaltet. In diesem Beitrag soll dargestellt werden, welche Rolle die sprachwissenschaftliche Diskursanalyse bei der Bedeutungserfassung spielen kann, und wie unterschiedliche Bewertungen und inhaltliche Thematisierungen seitens der Sprechergemeinschaft beim Gebrauch brisanter Begriffe in der öffentlichen Kommunikation zum Ausdruck kommen. Mithilfe einer konkreten linguistisch-diskursorientierten Untersuchung des Ausdrucks Globalisierung soll die enge Verflechtung von Sprachanalyse mit Zeit- und Kulturgeschichte verdeutlicht werden.
DIL is a bilingual (German-Italian) online dictionary of linguistics. It is still under construction and contains 240 lemmas belonging to the subfield of “German as a Foreign Language”, but other subfields are in preparation. DIL is an open dictionary; participation of experts from various subfields is welcome. The dictionary is intended for a user group with different levels of knowledge, therefore it is a multifunctional dictionary. An analysis of existing dictionaries, either in their online or written form, was essential in order to make important decisions for the macro- or microstructure of DIL; the results are discussed. Criteria for the selection of entries and an example of an entry conclude the article.
In an earlier publication it was claimed that there is no useful relationship between Swahili-English dictionary look-up frequencies and the occurrence frequencies for the same wordforms in Swahili-English corpora, at least not beyond the top few thousand wordforms. This result was challenged using data for German by a different team of researchers using an improved methodology. In the present article the original Swahili-English data is revisited, using ten years’ worth of it rather than just two, and using the improved methodology. We conclude that there is indeed a positive relationship. In addition, we show that online dictionary look-up behaviour is remarkably similar across languages, even when, as in our case, one is dealing with languages from very dissimilar language families. Furthermore, online dictionaries turn out to have minimum look-up success rates, below which they simply cannot go. These minima are language-sensitive and vary depending on the regularity of the searched-for entries, but are otherwise constant no matter the size of randomly sampled dictionaries. Corpus-informed sampling always improves on any random method. Lastly, from the point of view of the graphical user interface, we argue that the average user of an online bilingual dictionary is better served with a single search box, rather than separate search boxes for each dictionary side.
DIL ist ein deutsch-italienisches Online-Fachwörterbuch der Linguistik. Es ist ein offenes Wörterbuch und mit diesem Beitrag wird für eine mögliche Zusammenarbeit, Kollaboration plädiert. DIL ist noch im Aufbau begriffen; zur Zeit ist nur die Sektion DaF komplett veröffentlicht, auch wenn andere Sektionen in Bearbeitung sind. Die Sektion LEX (Lexikographie), die zur Veröffentlichung ansteht, wird zusammen mit den wichtigsten Eigenschaften des Wörterbuches präsentiert.
Im Onlinewörterbuch elexiko (www.elexiko.de) sind eine Reihe von hochfrequenten Stichwörtern im Rahmen des sogenannten „Lexikons zum öffentlichen Sprachgebrauch“ ausführlich in ihrer Bedeutung und Verwendung korpusgestützt beschrieben. Dieser Wortschatzausschnitt deckt verschiedene Themen aus Politik und Gesellschaft ab und enthält Lexeme, die zentralen politischen und gesellschaftlichen Diskursen, wie sie im Korpus präsent sind, angehören. In elexiko werden diese Lexeme semantisch und pragmatisch angemessen, d.h. hinreichend differenziert und sprachreflektierend dargestellt. Dabei folgt die Darstellung der linguistischen Konzeption von elexiko, die im Band „Grundfragen der elektronischen Lexikografie. elexiko – das Online-Informationssystem zum deutschen Wortschatz“ (herausgegeben von Ulrike Haß, 2005) dargelegt ist.
In the past two decades, more and more dictionary usage studies have been published, but most of them deal with questions related to what users appreciate about dictionaries, which dictionaries they use and what type of information they need in specific situations — presupposing that users actually consult lexicographic resources. However, language teachers and lecturers in linguistics often have the impression that students do not use enough high-quality dictionaries in their everyday work. With this in mind, we launched an international cooperation project to collect empirical data to evaluate what it is that students actually do while attempting to solve language problems. To this end, we applied a new methodological setting: screen recording in conjunction with a thinking-aloud task. The collected empirical data offers a broad insight into what users really do while they attempt to solve language-related tasks online.
Two empirical studies were carried out in the project „Lexik des gesprochenen Deutsch” (LeGeDe) at the Institute for the German Language (IDS) in Mannheim. The main goal of these studies was to shed light on people’s expectations of the planned lexicographical online-resource. In the first study, selected experts were interviewed in the form of a guided interview. In the second study, a broader online survey was conducted, which should reach a wider range of potential users. This contribution introduces the basic concepts of the project LeGeDe, outlines the two studies and presents selected results on four subject blocks: (i) sociodemographic data, (ii) personal use of (online) dictionaries, (iii) individual experience with the lexis of spoken language and (iv) expectations concerning a lexicographical online-resource for spoken German.
This paper discusses changes of lexicographic traditions with respect to approaches to meaning descriptions towards more cognitive perspectives. I will uncover how cognitive aspects can be incorporated into meaning descriptions based on corpus-driven analysis. The new German Online dictionary “Paronyme − Dynamisch im Kontrast” (Storjohann 2014; 2016) is concerned with easily confused words such as effektiv/effizient, sensibel/sensitiv. It is currently in the process of being developed and it aims at adopting a more conceptual and encyclopaedic approach to meaning by incorporating cognitive features. As a corpus-guided reference work it strives to adequately reflect ideas such as conceptual structure, categorisation and knowledge. Contrastive entries emphasise aspects of usage, comparing conceptual categories and indicate the (metonymic) mapping of knowledge. Adaptable access to lexicographic details and variable search options offer different foci and perspectives on linguistic information, and authentic examples reflect prototypical structures. Some of the cognitive features are demonstrated with the help of examples. Firstly, I will outline how patterns of usage imply conceptual categories as central ideas instead of sufficiently logical criteria of semantic distinction. In this way, linguistic findings correlate better with how users conceptualise language. Secondly, it is pointed out how collocates are treated as family members and fillers in contexts. Thirdly, I will demonstrate how contextual structure and functions are included summarising referential information. Details are drawn from corpus data, they are usage-based linguistic patterns illustrating conversational interaction and semantic negotiations in contemporary public discourse. Finally, I will outline consultation routines which activate different facets of structural knowledge, e.g. through changes of the ordering of information or through the visualisation of semantic networks.
Este artículo expone a partir de una serie de ejemplos diferentes situaciones de uso del diccionario bilingüe que evidencian la importancia de llevar a cabo una adecuada adquisición y desarrollo de las competencias lexicográficas en el contexto de enseñanza-aprendizaje de lenguas extranjeras y, en este caso en concreto, del alemán como lengua extranjera. Con este propósito se parte de tres competencias básicas: la selección de la obra lexicográfica adecuada según la situación comunicativa, la desambiguación pertinente en el contexto de la recepción en L2 y traducción de L2 a L1 y la selección y uso del equivalente en el contexto de la producción y traducción en la L2. El objetivo de esta aportación es poner de manifiesto la necesidad de identificar adecuadamente por parte del usuario de un recurso lexicográfico bilingüe la información lexicológica pertinente a la forma, contenido y uso de los lemas consultados tanto en la situación de recepción y producción en L2 como en el contexto de la traducción de y a L2.
Am 1. September 2016 hat das Forschungsprojekt „Lexik des gesprochenen Deutsch“ (= LeGeDe) am Institut für Deutsche Sprache in Mannheim als Kooperationsprojekt der Abteilungen Pragmatik und Lexik seine Arbeit aufgenommen. Dieses drittmittelgeförderte Projekt der Leibniz-Gemeinschaft (Leibniz-Wettbewerb 2016; Förderlinie 1: Innovative Vorhaben) hat eine Laufzeit von drei Jahren (1.9.2016-31.8.2019) und besteht aus einem Team von Mitarbeiterinnen und Mitarbeitern aus den Bereichen Lexikologie, Lexikografie, Gesprächsforschung, Korpus- und Computerlinguistik sowie Empirische Methoden. Im folgenden Beitrag werden neben Informationen zu den Eckdaten des Projekts, zu den unterschiedlichen Ausgangspunkten, dem Gegenstandsbereich, den Zielen sowie der LeGeDe-Datengrundlage vor allem einige grundlegende Forschungsfragen und methodologische Ansätze aufgezeigt sowie erste Vorschläge zur Gewinnung, Analyse und Strukturierung der Daten präsentiert. Zur lexikografischen Umsetzung werden verschiedene Möglichkeiten skizziert und im Ausblick einige Herausforderungen zusammengefasst.
In diesem Beitrag werden erste Erfahrungen mit und Überlegungen zu der Aufgabe dargelegt, ein Mikrostrukturenprogramm für ein Hypertext-Wörterbuch zu entwerfen. Zur Hypertextualisierung gedruckter Wörterbücher gibt es inzwischen erste Veröffentlichungen; meist bleibt hier die Bindung an eine gedruckte Vorlage, und sei die Hypertextualisierung noch so konsequent, bestehen. Im Unterschied zu solchen Hypertext-Wörterbüchern gehen nachfolgende Überlegungen von einem vorlagenunabhängigen Hypertext aus, dessen allgemeines Ziel es ist, Informationen zum deutschen Wortschatz zu vermitteln. Die hier vorgestellten Erfahrungen und Überlegungen sind an ein konkretes Projekt gebunden: LEKSIS - das lexikalisch-lexikologische Informationssystem des Instituts für Deutsche Sprache, Mannheim. Auf eine (weitere) Projektbeschreibung wird hier aber verzichtet; sie findet sich in Fraas/Haß-Zumkehr (1999), ferner auf der Homepage unter http://www.ids-mannheim.de/wiw. Vor dem Hintergrund dieses Projektes stehen die Bedingungen bzw. lexikografischen Konsequenzen des Mediums Hypertext im Unterschied zum Druck zur Diskussion.
elexiko - Das Projekt
(2005)
Das Bedeutungsspektrum
(2005)
Besonderheiten des Gebrauchs
(2005)
This paper gives an insight into the basic concepts for a corpus-based lexical resource of spoken German, which is being developed by the project "The Lexicon of Spoken German"(Lexik des gesprochenen Deutsch, LeGeDe) at the "Institute for the German Language" (Institut für Deutsche Sprache, IDS) in Mannheim. The focus of the paper is on initial ideas of semi-automatic and automatic resources that assist the quantitative analysis of the corpus data for the creation of dictionary content. The work is based on the "Research and Teaching Corpus of Spoken German" (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).
In this paper, we will present a first attempt to classify commonly confused words in German by consulting their communicative functions in corpora. Although the use of so-called paronyms causes frequent uncertainties due to similarities in spelling, sound and semantics, up until now the phenomenon has attracted little attention either from the perspective of corpus linguistics or from cognitive linguistics. Existing investigations rely on structuralist models, which do not account for empirical evidence. Still, they have developed an elaborate model based on formal criteria, primarily on word formation (cf. Lăzărescu 1999). Looking from a corpus perspective, such classifications are incompatible with language in use and cognitive elements of misuse.
This article sketches first lexicological insights into a classification model as derived from semantic analyses of written communication. Firstly, a brief description of the project will be provided. Secondly, corpus-assisted paronym detection will be focused. Thirdly, in the main section the paper concerns the description of the datasets for paronym classification and the classification procedures. As a work in progress, new insights will continually be extended once spoken and CMC data are added to the investigations.
Dieser Band gewährt Einblick in den Entstehungsprozess von elexiko, einem im Aufbau befindlichen, korpusgestützten Online-Wörterbuch zur deutschen Gegenwartssprache. Das elexiko-Wörterbuch wird kontinuierlich erweitert (durch neue Stichworteinträge, durch die Freischaltung redaktionell bearbeiteter Wortartikel, durch die Integration automatisch ermittelter Informationen) und kann sich auch an der Benutzeroberfläche verändern (durch ein neues Design oder weitere Recherchemöglichkeiten). Solche Veränderungen, insbesondere aber auch die Erfahrungen, die bei der Erarbeitung der Wortartikel auf der Grundlage eines umfangreichen zeitungssprachlichen Korpus gemacht wurden und die ein Nachdenken über die ursprüngliche Konzeption bedingten, werden in den verschiedenen Beiträgen beschrieben. Alle zentralen Angabebereiche in den Wortartikeln (Bedeutungserläuterung, lexikalische Mitspieler, typische Verwendungsmuster, sinnverwandte Wörter, Besonderheiten des Gebrauchs und Grammatik) sind dabei berücksichtigt. Daneben werden kleinere lexikografische Angaben (z.B. Illustrationen, Ausspracheangaben) wie Fragen der Lemmatisierung (z.B. von Eigennamen) thematisiert. Schließlich werden die praktischen Erfahrungen mit der Datenmodellierung von elexiko (eine granulare, maßgeschneiderte XML-Struktur) reflektiert.
"Themengebundene Verwendung(en)" als neuer Angabetyp unter der Rubrik "Besonderheiten des Gebrauchs"
(2011)
Diachrone Angaben
(2005)
This article provides an introduction to elexiko, the first German hypertext dictionary to be compiled on a corpus basis, which is currently being developed at the Institut für Deutsche Sprache Mannheim (IDS). First, a brief account of the design is given, followed by a demonstration of the methods and tools that are being employed to compile it. elexiko will provide not only an improved quantity of lexical information, but also a new quality of information which will be explained and illustrated at different levels of the microstructure of the dictionary. The description of word meaning and use in elexiko will be presented in detail, with a particular focus on the treatment of collocations, ambiguity, vagueness, and the presentation of senses. The development of a theoretically grounded procedure for lexicographic disambiguation is also described. This is then followed by a brief account of the treatment of grammatical details. Finally, issues of usability, the progress of the project and its future perspectives will be considered.
Contextual lexical relations, such as sense relations, have traditionally played an essential role in disambiguating word senses in lexicography, as they offer insights into the meaning and use of a word. However, the description of paradigmatic relations in particular is often restricted to a few types such as synonymy and antonymy. The limited description of various types of relations and the method of presenting these relations in existing German dictionaries are often problematic.
Elexiko, the first German hypertext dictionary compiled exclusively on the basis of an electronic corpus, offers a new way of presenting sense relations, using a variety of approaches to extract the necessary data. In this paper, I will show how elexiko presents a differentiated system of paradigmatic relations including synonymy, various subtypes of incompatibility (such as antonymy, complementarity, converseness, reversiveness, etc.), and vertical structures (such as hyponymy and meronymy). Primary attention, however, will focus on the question of how data for a paradigmatic description is retrieved from the corpus. Whereas a corpus-driven approach is mainly used for various semantic information and a corpus-based method plays an important part in obtaining data for the grammatical description in elexiko, it will be argued that both the corpus-driven and the corpus-based approach can be complementary methods in gaining insights into sense relations. I will demonstrate which results can be obtained by each approach, and advantages and disadvantages of both procedures will be explored in more detail.
As sense relations are context-dependent, it will also be demonstrated how a sense-bound presentation can be realised in an electronic reference work including a system of cross-referencing that illustrates lexical structures and the interrelatedness of words within the lexicon. Finally, I will show how accompanying examples from the corpus and additional lexicographic information help the user to understand contextual restrictions, so that s/he is able to use dictionary information more effectively.
Die elexiko-Stichwortliste
(2005)
This paper presents some results from an online survey regarding the functions and presentation of lexicographically compiled and automatically compiled corpus citations in a general monolingual e-dictionary of German (elexico). Our findings suggest that dictionary users have a clear understanding of the functions of corpus citations in lexicography.
Vorwort
(2014)
Die Beiträge des vorliegenden Bandes sind das Ergebnis eines interdisziplinären Workshops, der zum Abschluss des Projekts unter dem Titel „Varianz und Vielfalt interdisziplinär: Wörter und Strukturen“ im Dezember 2012 in Darmstadt stattfand. Dabei wurden Erkenntnisse und Erfahrungen aus der Untersuchung von „Wechselwirkungen zwischen linguistischen und bioinformatischen Verfahren, Methoden und Algorithmen für die Modellierung und Abbildung von Varianz in Sprache und Genomen“ zusammengefasst. Ein Schwerpunkt lag hierbei auf elektronischen Wörterbüchern, ihrer Heterogenität, der in ihnen dokumentierten Varianz sowie auf den Werkzeugen und Methoden, die zu ihrer Erschließung und Analyse dienen. Weitere sprachwissenschaftlich motivierte Themenbereiche umfassten z.B. die synchrone und diachrone Varianz, die quantitative Linguistik, Morphologie und Sprachwandelprozesse, Varianz in Wortfamilien wie auch die Erschließung von Varianz. Anschließend konnte das Phänomen der Varianz aus verschiedensten Perspektiven beleuchtet werden und ein Beitrag zur Konstituierung einer disziplinübergreifenden Abstraktionsebene geleistet werden. Der vorliegende Band enthält einige der Vorträge und führt heterogene Forschungsgegenstände zusammen, die zwischen Lexikografie, Computerlinguistik, (historischer) Sprachwissenschaft und den digitalen Geisteswissenschaften transzendieren.
Während lexikographische Prozesse, die zur Publikation gedruckter Wörterbücher führen, bereits seit einigen Jahrzehnten im Fokus der Wörterbuchforschung stehen und die dafür unterschiedenen Phasen der Vorbereitung, der Datenbeschaffung, der Datenaufbereitung, der Datenauswertung und der Satz- und Druckvorbereitung mittlerweile als etabliert betrachtet werden dürfen, steht die Diskussion und Beschreibung lexikographischer Prozesse von Internetwörterbüchern noch in den Anfängen. Zwar besteht kein Zweifel daran, dass sich lexikographische Prozesse bei der Publikation von Internetwörterbüchern anders gestalten als bei Printwörterbüchern, doch die Fragen, inwiefern sie dies tun, welchen Einfluss die neuen Möglichkeiten der Datengewinnung aus elektronischen Textkorpora auf die Prozesse haben, wie Bearbeitungsteilwortschätze auszuwählen sind, wie verschiedene Fassungen zu versionieren und zu archivieren sind und wie sich schließlich die Änderungen der lexikographischen Prozesse auf die Nutzer auswirken, ob und wie die Nutzer in diese Prozesse einbezogen werden können, sind noch nicht ausführlich beantwortet.
Diese und andere Fragen waren daher Gegenstand des vierten Arbeitstreffens des wissenschaftlichen Netzwerks “Internetlexikografie”, das am 22. und 23. November 2012 an der Universität Trier stattfand und vom Kompetenzzentrum für elektronische Erschließungs- und Publikationsverfahren in den Geisteswissenschaften/Trier Center for Digital Humanities organisiert wurde. Die Auseinandersetzung mit dem lexikographischen Prozess wurde fortgesetzt in drei Arbeitsgruppen, die sich mit Auswahlkriterien, Umsetzung und Problemen von Bearbeitungsteilwortschätzen, mit Archivierung und Versionierung und mit dem korpusbasierten Vorgehen bei der Erweiterung bestehender lexikographischer Ressourcen beschäftigten. Der vorliegende Band beschäftigt sich mit den in den Diskussionsrunden und Arbeitsgruppen gefundenen Ergebnissen und den dort identifizierten weiterführenden Fragen.
This paper discusses the advantages and disadvantages of the combination of automated information and lexicographically interpreted information in online dictionaries, namely elexiko, a hypertext dictionary and lexical data information system of contemporary German (http://www.owid.de/ elexiko_/index.html), and DWDS, a digital dictionary of 20,h century German (http://www.dwds.de). Examples of automatically derived information (e.g. automatically extracted citations from the underlying corpus, lists on paradigmatic relations) and lexicographically compiled information (e.g. information on paradigmatic partners) are provided and evaluated, reflecting on the need to develop guidelines as to how computerised information and lexicographically interpreted information may be combined profitably in online reference works.
What makes a good online dictionary? Empirical insights from an interdisciplinary research project
(2011)
This paper presents empirical fmdings from two online surveys on the use of online dictionaries, in which more than 1,000 participants took part. The aim of these studies was to clarify general questions of online dictionary use (e.g. which electronic devices are used for online dictionaries or different types of usage situations) and to identify different demands regarding the use of online dictionaries. We will present some important results ofthis ongoing research project by focusing on the latter. Our analyses show that neither knowledge of the participants’ (scientific or academic) background, nor the language Version of the online survey (German vs. English) allow any significant conclusions to be drawn about the participant’s individual user demands. Subgroup analyses only reveal noteworthy differences when the groups are clustered statistically. Taken together, our fmdings shed light on the general lexicographical request both for the development of a user-adaptive interface and the incorporation of multimedia elements to make online dictionaries more user-friendly and innovative.
Compared with printed dictionaries, online dictionaries provide a number of unique possibilities for the presentation and processing of lexicographical information. However, in Müller-Spitzer/Koplenig/Töpel (2011) we show that – on average - users tend to rate the special characteristics of online dictionaries (e.g. multimedia, adaptability) as (partly) unimportant. This result conflicts somewhat with the lexicographical request both for the development of a user-adaptive interface and the incorporation of multimedia elements. This contribution seeks to explain this discrepancy, by arguing that when potential users are fully informed about the benefits of possible innovative features of online dictionaries, they will come to judge these characteristics to be more useful than users that do not have this kind of information. This argument is supported by empirical evidence presented in this paper.
The representation of semantic relations between word senses of different entries in a dictionary is subject to a number of consistency requirements. This paper discusses the issue of maintaining and accessing consistent information on cross-references between sense-related items in electronic dictionaries from a mainly text-technological point of view. We present a number of consistency criteria for cross-referencing related senses and propose a practical approach to handling sense relations in an online dictionary. Our proposal is currently being tested in a large ongoing online dictionary project for German called elexiko. We focus on three different aspects of the dictionary development and editing process where consistency is an important issue: lexicographic data modelling, implementation of a lexicographic database system for an electronic dictionary, and development of practical tools for the lexicographer’s workbench.
vernetziko is an assistive software tool primarily designed for managing cross-references in XML-based electronic dictionaries. In its current form it has been developed as an integral part of the lexicographic editing environment for the German monolingual dictionary elexiko developed and compiled at the Institut für Deutsche Sprache, Mannheim. This paper first briefly outlines how vernetziko fits into the XML-based dictionary editing technology of elexiko. Then vernetziko’s core functionality and some of the auxiliary tools integrated into the program are presented from both a practical and a technological point of view. The concluding sections discuss some software engineering aspects of extending the tool to handle cross-references between multiple resources and point out some of the advantages of vernetziko vis-à-vis corresponding features of proprietary dictionary writing systems. The software can be adapted to interconnect off-the-shelf components (database management systems and editors), thus providing a tailor-made lexicographical workbench for a wide range of XML-based dictionaries without vendor lock-in.
This paper summarizes essential steps of a workshop-like presentation of lexicographic practice and reflects an application-oriented demonstration. As a point of departure the question is raised of how different linguistic information is extracted from a corpus for the inclusion in a dictionary. The introductory part on lexicographic objectives is followed by insights into methodological aspects (e. g. online dictionary elexiko). A conclusive example is provided to illustrate the procedure.
Vorwort
(2011)
In diesem Beitrag geht es einerseits um eine Definition dessen, was korpusgestützte Lexikographie ist, und andererseits um eine Bestandsaufnahme der gegenwärtigen Praxis korpusgestützter Lexikographie. Dabei wird ein Schwerpunkt gelegt auf allgemeinsprachige Wörterbücher der Gegenwartssprache, deren Inhalt die Beschreibung von Bedeutung und Verwendung von Lexemen ist. Außerdem liegt die Einschätzung zugrunde, dass die Auswertung elektronischer Korpora die Wörterbucharbeit weitgehend positiv beeinflusst und verändert, vorausgesetzt, dass zugrunde gelegte Korpus wurde für das geplante Wörterbuch so gut wie möglich in Umfang und Zusammensetzung eingerichtet.
The project elexiko compiles an extensive, monolingual dictionary of Contemporary German. This contribution deals with the grammatical data in this dictionary; it is not only described how these are arranged content-wise depending on corpus data, but also how they were modelled.
Das Projekt elexiko erarbeitet ein umfangreiches, einsprachiges Wörterbuch des Gegenwartsdeutschen. In diesem Beitrag geht es um die grammatischen Angaben in diesem Wörterbuch; es wird nicht nur erläutert, wie diese inhaltlich in Abhängigkeit vom Prinzip der Korpusbasiertheit gestaltet sind, sondern auch, wie sie modelliert wurden.
Einleitung
(2011)
elexiko ist ein im Aufbau befindliches Online-Wörterbuch, d. h. es ist ständigen Änderungen in Form von Korrekturen oder Ergänzungen unterworfen. Diese betreffen sowohl die Stichwortliste als auch die lexikografischen Angaben. In diesem Beitrag sollen einige kleinere konzeptionelle Entscheidungen und offene Fragen, die in den anderen Beiträgen in diesem Sammelband noch nicht thematisiert wurden, zusammengefasst werden.
In this paper, some tendencies in contemporary German lexicography are described with an emphasis on Internet dictionaries. Several online dictionaries of German from publishing houses or published by academic institutions are introduced representing a wide range of possible presentations of lexico-graphic information on the Internet. The examples also illustrate how modem dictionaries profit from a corpus-based approach, how information on headwords can be presented in an innovative way and how the contact with the dictionary user can be established. Research into dictionary usage is also ex-amined briefly. A short outlook on the possible future of German lexicography rounds off the survey given here.