Refine
Year of publication
Document Type
- Part of a Book (61)
- Article (47)
- Conference Proceeding (12)
- Other (6)
- Book (2)
- Preprint (2)
Keywords
- Deutsch (41)
- Computerunterstützte Lexikographie (37)
- Wörterbuch (33)
- Korpus <Linguistik> (17)
- Wortschatz (16)
- Geschlechtergerechte Sprache (13)
- Benutzer (11)
- Internet (11)
- COVID-19 (10)
- computerunterstützte Lexikographie (10)
Publicationstate
- Veröffentlichungsversion (67)
- Zweitveröffentlichung (23)
- Postprint (11)
Reviewstate
- (Verlags)-Lektorat (44)
- Peer-Review (30)
- Verlags-Lektorat (7)
- Peer-review (2)
- (Verlags)Lektorat (1)
- Verlagslektorat (1)
Publisher
- de Gruyter (18)
- De Gruyter (14)
- Leibniz-Institut für Deutsche Sprache (IDS) (10)
- Institut für Deutsche Sprache (8)
- IDS-Verlag (4)
- Narr (4)
- Benjamins (3)
- MDPI (3)
- Wilhelm Fink (3)
- Buro van die WAT (2)
In many countries of the world, perspectives on gender equality and racism have changed in recent decades. One result has been more attention being devoted to traces of androcentric and racist language in society. This also affects dictionaries. In lexicography there are discussions about whether or to what extent social asymmetries are inscribed in dictionaries and if this is still acceptable. The issue of the nature of description plays an important role in this discussion. If sexist usages are often found in language use, i.e. in the corpus data on which the dictionary is based, does the dictionary also have to show them? How is this, in turn, compatible with the normative power of dictionaries? Do dictionaries contribute to the perpetuation of gender stereotypes by showcasing them under the banner of descriptive principles? And what roles do lexicographers play in this process? The article deals with these questions on the basis of individual lexicographical examples and current discussions in the lexicographic and public community.
Annotated dataset consisting of personal designations found on websites of 42 German, Austrian, Swiss and South Tyrolean cities. Our goal is to re-evaluate the websites every year in order to see how the use of gender-fair language develops over time. The dataset contains coordinates for the creation of map material.
The first international study (N=684) we conducted within our research project on online dictionary use included very general questions on that topic. In this chapter, we present the corresponding results on questions like the use of both printed and online dictionaries as well as on the types of dictionaries used, devices used to access online dictionaries and some information regarding the willingness to pay for premium content. The data collected by us, show that our respondents both use printed and online dictionaries and, according to their self-report, many different kinds of dictionaries. In this context, our results revealed some clear cultural differences: in German-speaking areas spelling dictionaries are more common than in other linguistic areas, where thesauruses are widespread. Only a minority of our respondents is willing to pay for premium content, but most of the respondents are prepared to accept advertising. Our results also demonstrate that our respondents mainly tend to use dictionaries on big-screen devices, e.g. desktop computers or laptops.
Um das Thema Gendern oder geschlechtergerechte Sprache hat sich eine hitzige gesellschaftliche Debatte entwickelt. Seit Anfang des Jahres ist die Diskussion um geschlechtergerechte Sprache medial wieder besonders präsent. Anlass ist u.a. die Überarbeitung der Bedeutungsbeschreibungen im Duden online. Vor kurzem widmete sogar Der Spiegel dem Thema den Hefttitel und einen Leitartikel (vgl. Bohr et al. 2021). Allerdings erschöpft sich die Diskussion leicht in Pro- und Kontra-Positionen, dabei gibt es eine ganze Bandbreite von Aspekten rund um das Thema ‚geschlechtergerechte Sprache‘ zu betrachten, die eine differenziertere Diskussion ermöglichen können. Ziel dieses Beitrags ist es, einige dieser Aspekte knapp und möglichst verständlich in die Debatte einzubringen.
The project elexiko compiles an extensive, monolingual dictionary of Contemporary German. This contribution deals with the grammatical data in this dictionary; it is not only described how these are arranged content-wise depending on corpus data, but also how they were modelled.
Das Projekt elexiko erarbeitet ein umfangreiches, einsprachiges Wörterbuch des Gegenwartsdeutschen. In diesem Beitrag geht es um die grammatischen Angaben in diesem Wörterbuch; es wird nicht nur erläutert, wie diese inhaltlich in Abhängigkeit vom Prinzip der Korpusbasiertheit gestaltet sind, sondern auch, wie sie modelliert wurden.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Introduction
(2015)
Dictionary usage research is a topic of increasing importance within the field of lexicography. At the beginning of the new millennium, the dictionary user was still relatively unknown. However, in the last ten years, more and more user studies have been published. Consequently, methods, data and the conclusions which can be drawn were successively refined. Also, new possibilities of web-based data collection, e.g., the analysis of log files, enriched this field of research. This contribution aims to describe the state of the art in dictionary usage research in the digital era. I begin by providing a short overview of methodological and terminological basics and then place a special focus on three different methods of collecting empirical data on dictionary use: online questionnaires, eye tracking and the analysis of log-files. All these methods are illustrated on user studies conducted at the Institute for the German Language in Mannheim.
Kommunikationsverben, an online reference work on German communication verbs and part of the dictionary portal OWID, describes the meaning of communication verbs on two levels: a lexical level, represented in the dictionary entries and by sets of lexical features, and a conceptual level, represented by different types of situations referred to by specific types of verbs. These two levels have each been implemented in special types of access structures. A first explorative access to the conceptual level provides the user with a list of the main classes of communication verbs, the subclasses of each of these, and the lexical fields pertaining to each subclass. Lexical fields are presented together with a characterisation of the situation type to which the verbs of that field are used to refer. Information about the conceptual level is additionally accessible by an advanced search option allowing the user to combine components of the characterisation of situation types to “create” any kind of situation and search for the verbs that correspond to it. Information about the lexical level of the meaning of communication verbs is accessible via the dictionary entries and by another advanced search option allowing the user to search for verbs with particular lexical features or combinations of these.
Wissenschaftlich basierte allgemeine Wörterbücher des Deutschen werden heute meist korpusbasiert erarbeitet, d. h. die in ihnen beschriebene Sprache wird vor der lexikografischen Beschreibung empirisch erforscht. Diese Korpora sind allerdings, wie die großen linguistischen Textsammlungen zum Deutschen allgemein, durch Zeitungstexte dominiert. Daher beruhen die in Wörterbüchern beschriebenen Kollokationen und typischen Verwendungskontexte zumindest teilweise auf dieser Textsorte. Wir untersuchen in unserem Beitrag anhand einer Fallstudie zu Mann und Frau, wie stark sich die Beschreibung solcher Kollokationssets ändern würde, wenn als Korpusgrundlage nicht Zeitungen, sondern Publikumszeitschriften oder belletristische Texte herangezogen würden und wie unterschiedlich demnach Geschlechterstereotype dargestellt würden. Damit diskutieren wir auch die Frage, ob Zeitungstexte in diesem Fall ein adäquates und vielseitiges Abbild des Gebrauchsstandards zeigen. Auf einer allgemeineren Ebene wird dadurch ein grundlegendes Problem korpuslinguistischer Forschungsarbeiten tangiert, nämlich die Frage, inwieweit durch Korpora überhaupt ein ‚objektives‘ Bild der sprachlichen Wirklichkeit gezeichnet werden kann.
Less than one percent of words would be affected by gender-inclusive language in German press texts
(2024)
Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.
Lexikografie im Internet
(2008)
The methods utilized in the area of research into dictionary use are established research methods in the social sciences. After explicating the different steps of a typical empirical investigation, this article provides examples of how these different methods are used in various user studies conducted in the field of using online dictionaries. Thereby, different kinds of data collection (surveys as online questionnaires, log files and eye tracking) as well as different research design structures (for instance, ex-post-facto design or experimental design) are discussed.
Der folgende Leitfaden bietet eine grundlegende Übersicht darüber, welche Schritte bei der Konzeption und Durchführung einer empirischen Untersuchung in der germanistischen Linguistik zu beachten sind. Wir werden den grundlegenden Ablauf und die zugrunde liegenden Konzepte allgemein bzw. modellhaft beschreiben und sie anhand von einfachen Beispielen illustrieren. Eine stärkere Ausgestaltung anhand von Beispielen zu verschiedenen linguistischen Forschungsfragen und -feldern und damit auch mehr Illustrationen, wie die einzelnen Schritte für bestimmte Forschungsfragen umzusetzen sind, finden Sie in den Fallstudien im —> Teil III dieses Bandes. Detailliertere Ausführungen zu den zentralen Konzepten des empirischen Arbeitens in der Linguistik finden Sie in —> Teil VI dieses Bandes. Weiterführende Literatur findet sich am Ende des Beitrags.
Im vorliegenden Beitrag gehen wir von der Prämisse aus, dass die Angemessenheit sprachlicher Formen nicht pauschal, sondern anhand des jeweiligen Kontexts zu beurteilen ist. Anhand einer Online-Fragebogenstudie mit durch weil eingeleiteten Nebensätzen untersuchen wir die Hypothese, dass Varianten, die nicht dem Schriftstandard entsprechen, in Kommunikationsformen, die sich weniger an standard- und schriftsprachlichen Normen orientieren, als (mindestens) ebenso angemessen oder zumindest unterschiedlich wahrgenommen werden wie eine schriftstandardsprachliche Variante. Wir untersuchen dies anhand von drei Aufgaben: Rezeption, Produktion und Assoziation zu bestimmten Medien und Textsorten. Wir können zeigen, dass die schriftnormgerechte Variante durchweg als am akzeptabelsten eingeschätzt wird. In allen drei Aufgaben finden sich aber auch eindeutige und übereinstimmende Effekte, die nahelegen, dass die verschiedenen Varianten in Abhängigkeit der Textsorte doch unterschiedlich eingeschätzt, produziert und assoziiert werden.
Dictionary usage research views dictionaries primarily as tools for solving linguistic problems. A large proportion of dictionary use now takes place online and can thus be easily monitored using tracking technologies. Using the data gathered through tracking usage data, we hope to optimize user experiences of dictionaries and other linguistic resources. Usage statistics are also used for external evaluation of linguistic resources. In this paper, we pursue the following three questions from a quantitative perspective: (1) What new insights can we gain from collecting and analysing usage data? (2) What limitations of the data and/or the collection process do we need to be aware of? (3) How can these insights and limitations inform the development and evaluation of linguistic resources?
Digital or electronic lexicography has gained in importance in the last few years. This can be seen in the growing list of publications focusing on this field. In the OBELEX bibliography (http://www.owid.de/obelex/engl), the research contributions in this field are consolidated and are searchable by different criteria. The idea for OBELEX originated in the context of the dictionary portal OWID, which incorporates several dictionaries from the Institute for German Language (www.owid.de). OBELEX has been available online free of charge since December 2008. OBELEX includes articles, monographs, anthologies and reviews published since 2000 that relate to electronic lexicography, as well as some relevant older works. Our particular focus is on works about online lexicography. Systematically evaluated sources are relevant journals like International Journal of Lexicography, Lexicographica, Dictionaries, Lexikos; furthermore Euralex-Proceedings, proceedings of the International Symposium on Lexicography in Copenhagen as well as relevant monographs and anthologies. Information on dictionaries is currently not included in OBELEX; the main focus is on metalexicography. However, we are working on a database with information on online dictionaries as a supplement to OBELEX. All entries of OBELEX are stored in a database. Thus, all parts of the bibliographic entry (such as person, title, publication or year) are searchable. Furthermore, all publications are associated with our keyword list; therefore, a thematic search is possible. The subject language is also noted. With this type of content, the OBELEX bibliography supplements in a useful way other bibliographic projects such as the printed ‘Internationale Bibliographie zur germanistischen Lexikographie und Wörterbuchforschung’ by H. E. Wiegand (Wiegand 2006/2007), the ‘Bibliography of Lexicography’ by R. R. K. Hartmann (Hartmann 2007), and the ‘International Bibliography of Lexicography’ of Euralex (cf. also DeCesaris and Bernal 2006). OBELEX differs from all these bibliographic projects by its strong focus on electronic lexicography and its ability to retrieve bibliographic information.
We present studies using the 2013 log files from the German version of Wiktionary. We investigate several lexicographically relevant variables and their effect on look-up frequency: Corpus frequency of the headword seems to have a strong effect on the number of visits to a Wiktionary entry. We then consider the question of whether polysemic words are looked up more often than monosemic ones. Here, we also have to take into account that polysemic words are more frequent in most languages. Finally, we present a technique to investigate the time-course of look-up behaviour for specific entries. We exemplify the method by investigating influences of (temporary) social relevance of specific headwords.
Olaf Scholz gendert. Eine Analyse von Personenbezeichnungen in Weihnachts- und Neujahrsansprachen
(2022)
Schlagzeilen wie die in unserer Überschrift blieben im Januar 2022 aus. Dabei enthielt die erste Neujahrsansprache von Olaf Scholz kein einziges generisches Maskulinum, sondern Doppelformen (Mitbürgerinnen und Mitbürger, Expertinnen und Experten), geschlechtsabstrahierende Ausdrücke (Eltern, Familien, Geimpfte, Menschen) und Personalisierungen bzw. Umschreibungen wie uns allen, es haben sich 60 Millionen […] impfen lassen, oder ich möchte allen danken. Die Rede nutzt somit durchgängig verschiedene Formen geschlechtergerechter Sprache, wohl aber so unauffällige Formen, dass dies keine mediale Aufmerksamkeit auf sich gezogen hat. Nebenbei: Dies zeigt, dass es bei den hitzigen öffentlichen Diskussionen rund um das Thema nicht um alle Formen geschlechtergerechter Sprache geht, sondern eigentlich nur um bestimmte Formen, wie z.B. die Verwendung des Gendersterns. Wir stellen hier einige Beobachtungen basierend auf einem annotierten Korpus von Ansprachen vor, die Sie selbst anhand einer Online-App nachvollziehen können.
This chapter presents empirical findings on the question which criteria are making a good online dictionary using data on expectations and demands collected in the first study (N=684), completed with additional results from the second study (N=390) which examined more closely whether the respondents had differentiated views on individual aspects of the criteria rated in the first study. Our results show that the classical criteria of reference books (e.g. reliability, clarity) were rated highest by our participants, whereas the unique characteristics of online dictionaries (e.g. multimedia, adaptability) were rated and ranked as (partly) unimportant. To verify whether or not the poor rating of these innovative features was a result of the fact that the subjects are not used to online dictionaries incorporating those features, we integrated an experiment into the second study. Our results revealed a learning effect: Participants in the learning-effect condition, i. e. respondents who were first presented with examples of possible innovative features of online dictionaries,judged adaptability and multimedia to be more useful than participants who did not have this information. Thus, our data point to the conclusion that developing innovative features is worthwhile but that it is necessary to be aware of the fact that users can only be convinced of its benefits gradually.
Online dictionary use
(2012)
The Online-Wortschatz-Informationssystem Deutsch (OWID Online German Lexical Information System) is a lexicographic Internet portal for various electronic dictionary resources that are being compiled at the Institute for the German Language (Institut für Deutsche Sprache, IDS). The main emphasis of OWID is on academic lexicographic resources of contemporary German. Presently, the following dictionaries are included in OWID: a dictionary of contemporary German called elexiko, a dictionary of neologisms, a small dictionary of collocations, and a discourse dictionary covering the lexemes that establish the discourse about “guilt” in the early post-war era 1945-1955. In the near future (2010/2011), several additional dictionaries will be published in OWID: a Textbook of German Communication Verbs, a Valency Dictionary of German Verbs, two further discourse dictionaries – one about the “democracy” discourse around 1968, the other covering the keywords of the German reunification 1989/1990. Moreover, 300 entries from a corpus-based project on proverbs will be integrated into OWID. Thereby, OWID is a constantly growing resource for academic lexicographic work of the German language.
Altogether, OWID is a special kind of dictionary portal owing to its content and its design, namely the integration of the various dictionaries, the access possibilities and the presentation features. With OWID, we try to establish a dictionary net where the different resources are jointly accessible not only by headwords, but also on the microstructural level. Prerequisite for these common access- and navigation-possibilities across the various dictionaries is the same concept for the lexicographic data model which we put into practice in OWID. Data from all dictionaries in OWID are structured according to a tailor-made, fine-granular, XML-based data model. In this data model, similar content is modelled similarly, dictionary related differences are preserved.
The main tasks for the future are to enhance OWID with further dictionary resources, to improve the inner access structures so that they exhaust the possibilities of the data model, and to customize the layout of the dictionaries as well as the search options according to the user’s needs
Lexikographische und lexikalische Ressourcen zum Deutschen werden an vielen unterschiedlichen Institutionen erarbeitet. Zum einen im Dudenverlag, der mit den gedruckten Wörterbüchern der Duden-Reihe und mit „Duden online“ die meistkonsultierten gegenwartssprachlichen Wörterbücher zum Deutschen erstellt, dann die Union deutscher Akademien, unter deren Dach an verschiedenen einzelnen Akademien zahlreiche historische wie auch synchrone Wörterbücher zum Deutschen erstellt werden (z. B. das „Digitale Wörterbuch der deutschen Sprache“, das „Wörterbuchnetz“ sowie das geplante Informationssystem des neuen „Zentrums für digitale Lexikographie der deutschen Sprache“). Auch am Institut für Deutsche Sprache in Mannheim werden wissenschaftliche wortschatzbezogene Ressourcen zum Deutschen erarbeitet und der (Fach-)Öffentlichkeit unter dem Dach von OWID, dem „Online-Wortschatz-Informationssystem Deutsch“, präsentiert. Obwohl wir uns in OWID auf Ressourcen zu spezialisierten Wortschatzbereichen konzentriert haben, erreichen wir Nutzerinnen und Nutzer in verschiedensten Ländern der Welt. Wir wollen hier die Gelegenheit wahrnehmen, den ZGL-Leserinnen und -Lesern unsere Ressourcen in OWID und OWIDplus näher vorzustellen.
The constantly changing requirements of today’s media landscape demand a new concept for literary editions. Such a forward-looking model should be SGML/XML-based, and should acknowledge the central importance of topic maps. In this respect, the Thomas Mann project combines in a unique way the work of one of the most famous authors of the 20th century with an innovative way of information organization.
In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.
Quantitativ ausgerichtete empirische Linguistik hat in der Regel das Ziel, grose Mengen sprachlichen Materials auf einmal in den Blick zu nehmen und durch geeignete Analysemethoden sowohl neue Phanomene zu entdecken als auch bekannte Phanomene systematischer zu erforschen. Das Ziel unseres Beitrags ist es, anhand zweier exemplarischer Forschungsfragen methodisch zu reflektieren, wo der quantitativ-empirische Ansatz fur die Analyse lexikalischer Daten wirklich so funktioniert wie erhofft und wo vielleicht sogar systembedingte Grenzen liegen. Wir greifen zu diesem Zweck zwei sehr unterschiedliche Forschungsfragen heraus: zum einen die zeitnahe Analyse von produktiven Wortschatzwandelprozessen und zum anderen die Ausgleichsbeziehung von Wortstellungsvs. Wortstrukturregularitat in den Sprachen der Welt. Diese beiden Forschungsfragen liegen auf sehr unterschiedlichen Abstraktionsebenen. Wir hoffen aber, dass wir mit ihnen in groser Bandbreite zeigen konnen, auf welchen Ebenen die quantitative Analyse lexikalischer Daten stattfinden kann. Daruber hinaus mochten wir anhand dieser sehr unterschiedlichen Analysen die Moglichkeiten und Grenzen des quantitativen Ansatzes reflektieren und damit die Interpretationskraft der Verfahren verdeutlichen.
Questions of design
(2014)
All lexicographers working on online dictionary projects that do not wish to use an established form of design for their online dictionary, or simply have new kinds of lexicographic data to present, face the problem of what kind of arrangement is best suited for the intended users of the dictionary. In this chapter, we present data about questions relating to the design of online dictionaries. This will provide projects that use these or similar ways of presenting their lexicographic data with valuable information about how potential dictionary users assess and evaluate them. In addition, the answers to corresponding open-ended questions show, detached from concrete design models, which criteria potential users value in a good online representation. Clarity and an uncluttered look seem to dominate in many answers, as well as the possibility of customization, if the latter is not connected with a too complex usability model.
In the past two decades, more and more dictionary usage studies have been published, but most of them deal with the question what users appreciate about dictionaries, which dictionaries they use and which information they need in specific situations. These studies presuppose that users indeed consult lexicographic resources. However, language teachers and lecturers of linguistics often have the impression that students use too few high-quality dictionaries in their every-day work. Against this background, we started an international cooperation project to collect empirical data evaluating that impression. Our aim was to evaluate what students (here from the Romance language area) actually do when they correct language problems. We used a new methodological setting to do this (screen recording with a thinking-aloud task). The empirical data we gained offers a broad insight into what language users really do when solving language-related tasks today.
This article presents empirical findings about what criteria make for a good online dictionary, using data on expectations and demands collected in an online questionnaire (N~684), complemented by additional results from a second questionnaire (N-390) which looked more closely at whether respondents had differentiated views on individual aspects of the criteria rated in the first study. Our results show that the classical criteria of reference books (such as reliability and clarity) were rated highest by our participants, whereas the unique characteristics of online dictionaries (such as multimedia and adaptability) were rated and ranked as (partly) unimportant. To verify whether or not the poor ratings of these innovative features were a result of the fact that our subjects are unfamiliar with online dictionaries incorporating such features, we incorporated an experiment into the second study. Our results revealed a learning effect: participants in the learning-effect condition, i.e. respondents who were first presented with examples of possible innovative features of online dictionaries, judged adaptability and multimedia to be more useful than participants who were not given that information. Thus, our data point to the conclusion that developing innovative features is worthwhile but that it should be borne in mind that users can only be persuaded of their benefits gradually. In addition, we present data about questions relating to the design of online dictionaries.
The development of user-adapted views of lexicographic data is frequently in demand by dictionary research on electronic reference works and hypertext information systems. In the printed dictionary it has been indispensable to develop a complete dictionary relative to a user group and using situations. In contrast, for any electronic presentation of lexicographic data there are possibilities to define user-specific views of an initially user-unspecific resource. However, research on the use of dictionaries in general, still has to answer several open questions as far as this subject is concerned. This paper will firstly provide an overview of the present state of research on dictionary use with respect to electronic lexicography. Subsequently, explanations of further prerequisites for a possible user-adapted access to data are followed, as exemplified by OWID, the Online Vocabulary Information System of the Institut für Deutsche Sprache. Finally, it will be outlined what results on the subject have been accomplished so far. Also the prospects of potential user-adapted presentations of lexicographic data will be highlighted.
This paper focuses on language change based on shifting social norms, in particular with regard to the debate on language and gender. It is a recurring argument in this debate that language develops "naturally" and that "severe interventions" - such as gender-inclusive language is often claimed to be - in the allegedly "organic" language system are inappropriate and even "dangerous". Such interventions are, however, not unprecedented. Socially motivated processes of language change are neither unusual nor new. We focus in our contribution on one important political-social space in Germany, the German Bundestag. Taking other struggles about language and gender in the plenaries of the Bundestag as a starting point, our article illustrates that language and gender has been a recurring issue in the German Bundestag since the 1980s. We demonstrate how this is reflected in linguistic practices of the Bundestag, by the use of a) designations for gays and lesbians; b) pair forms such as Bürgerinnen und Bürger (female and male citizens); and c) female forms of addresses and personal nouns ('Präsidentin' in addition to 'Präsident'). Lastly, we will discuss implications of these earlier language battles for the currently very heated debate about gender-inclusive language, especially regarding new forms with gender symbols like the asterisk or the colon (Lehrer*innen, Lehrer:innen; male*female teachers) which are intended to encompass all gender identities.
Wir stellen eine empirische Studie vor, die der Frage nachgeht, ob und in welchem Ausmaß Wörterbücher und andere lexikographische Ressourcen die Ergebnisse von Textüberarbeitungen verbessern. Studierende wurden in unserer Studie gebeten, zwei Texte zu optimieren und waren dabei zufällig in drei unterschiedliche Versuchsbedingungen eingeteilt: 1. ein Ausgangstext ohne Hinweise auf potenzielle Fehler im Text, 2. ein Ausgangstext, bei dem problematische Stellen im Text hervorgehoben waren und 3. ein Ausgangstext mit hervorgehobenen Problemstellen zusammen mit lexikographischen Ressourcen, die zur Lösung der spezifischen Probleme verwendet werden konnten. Wir fanden heraus, dass die Teilnehmer*innen der dritten Gruppe die meisten Probleme korrigierten und die wenigsten semantischen Verzerrungen während der Überarbeitung einführten. Außerdem waren sie am effizientesten (gemessen in verbesserten Textabschnitten pro Zeit). Wir berichten in dieser Fallstudie ausführlich vom Versuchsaufbau, der methodischen Durchführung der Studie und eventuellen Limitationen unserer Ergebnisse.
Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size
(2020)
Recently, it was demonstrated that generalized entropies of order α offer novel and important opportunities to quantify the similarity of symbol sequences where α is a free parameter. Varying this parameter makes it possible to magnify differences between different texts at specific scales of the corresponding word frequency spectrum. For the analysis of the statistical properties of natural languages, this is especially interesting, because textual data are characterized by Zipf’s law, i.e., there are very few word types that occur very often (e.g., function words expressing grammatical relationships) and many word types with a very low frequency (e.g., content words carrying most of the meaning of a sentence). Here, this approach is systematically and empirically studied by analyzing the lexical dynamics of the German weekly news magazine Der Spiegel (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017). We show that, analogous to most other measures in quantitative linguistics, similarity measures based on generalized entropies depend heavily on the sample size (i.e., text length). We argue that this makes it difficult to quantify lexical dynamics and language change and show that standard sampling approaches do not solve this problem. We discuss the consequences of the results for the statistical analysis of languages.
Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size
(2019)
Recently, it was demonstrated that generalized entropies of order α offer novel and important opportunities to quantify the similarity of symbol sequences where α is a free parameter. Varying this parameter makes it possible to magnify differences between different texts at specific scales of the corresponding word frequency spectrum. For the analysis of the statistical properties of natural languages, this is especially interesting, because textual data are characterized by Zipf’s law, i.e., there are very few word types that occur very often (e.g., function words expressing grammatical relationships) and many word types with a very low frequency (e.g., content words carrying most of the meaning of a sentence). Here, this approach is systematically and empirically studied by analyzing the lexical dynamics of the German weekly news magazine Der Spiegel (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017). We show that, analogous to most other measures in quantitative linguistics, similarity measures based on generalized entropies depend heavily on the sample size (i.e., text length). We argue that this makes it difficult to quantify lexical dynamics and language change and show that standard sampling approaches do not solve this problem. We discuss the consequences of the results for the statistical analysis of languages.
Textual structures in printed dictionaries are well known, adequately researched, and rather exhaustively described (cf. articles 3&10). This article investigates whether or not the models of textual structures in printed dictionaries can be applied to electronic dictionaries (EDs); or, more precisely, which parts of the order and terminology of textual structures in printed dictionaries are applicable to electronic ones and of which differences should one be aware. The focus will be on online dictionaries because they represent the most important kind of digital dictionary, and will become even more important in future. Furthermore, the emphasis will be more on potential future forms of online dictionaries than on current ones which are still sometimes produced as copies of their printed counterparts. To approach this question, basic differences between textual structures in electronic versus printed dictionaries will firstly be discussed. Secondly, further terminological and formal preliminary remarks will be made. The main part of the article will then follow to adapt de Schryver’s idea of “Creating order in dreamland” expressed in his article “Lexicographer’s dreams in the electronic dictionary age”. The aim here is to begin ‘create order in terminology land’ for textual structures in electronic dictionaries. A definitive order cannot be given here because electronic lexicography today involves constant change. In order to discuss the order of textual structures in EDs, not only theoretically, but also in concrete terms, their basic properties will be illustrated by means of a notional online dictionary. Following on from this fictitious scenario, a provisional survey of textual structures in EDs will be presented. Thereby, the focus is less on current online dictionaries than on the possibilities which the new medium provides. Finally, an explanation will be given as to how this view of structures in electronic dictionaries is useful for analyzing current EDs and for planning new ones. The overall aim here is not to introduce new kinds of textual structure in EDs and a corresponding terminology in detail, but to point out some constitutive differences between textual structures in printed dictionaries and those in electronic dictionaries.