Refine
Document Type
- Part of a Book (9)
- Article (4)
- Other (2)
- Conference Proceeding (1)
- Preprint (1)
Keywords
- Korpus <Linguistik> (17) (remove)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (8)
- Peer-Review (4)
- Verlags-Lektorat (1)
Publisher
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
Der Artikel stellt die Projekte vor, die sich im Rahmen der Projektmesse zur „Elektronischen Lexikografie“ präsentiert haben. Diese Messe wurde begleitend zur 46. Jahrestagung des Instituts für Deutsche Sprache veranstaltet. Es wird in diesem Beitrag auf der Basis der Messepräsentationen dargelegt, inwiefern Entwicklungen der Korpuslexikografie und der Internetlexikografie die lexikografische Erfassung syntagmatischer Aspekte des deutschen Wortschatzes befördern und welche lexikografischen Internetressourcen dazu verfügbar sind.
Electronic corpora play an ever growing role in lexicography. On the one hand, new access to linguistic usage is made possible through the use of text corpora and intelligent corpus-based query tools; however, the final results are still interpreted and described by lexicographers. In this case corpora are used for data acquisition. On the other hand, there are also projects that provide purely automatically acquired data in the form of "dictionaries". Lexicographers play only a minor role here. This latter type of corpus use creates a completely new kind of electronic dictionary. This article addresses the questions as to what extent these dictionaries differ from lexicographic tradition and whether they must be considered in metalexicography. Starting from previously compiled electronic dictionary typologies, we try to supplement the formulation of lexicographic data as a distinguishing feature. Finally, based on the findings of the project elexiko (Institute for the German Language - IDS), we demonstrate that the distinction between electronic versus man-made lexicographic data is also relevant to lexicographical practice.
Filtern, Explorieren, Vergleichen: neue Zugriffsstrukturen und instruktive Potenziale von OWIDplus
(2023)
OWIDplus, das Zusatzangebot zur Wörterbuchplattform OWID, vereint verschiedenste lexikalische Datenbanken, Korpustools und visuell aufbereitete Analysen, die mithilfe von Textsuche und Kategorienfiltern so sortiert werden können, dass Benutzer*innen leicht die für sie interessanten Projekte entdecken können. Eine tiefergehende Beschäftigung mit den Einzelprojekten zeigt, wie bei aller oberflächlicher Ähnlichkeit oder gemeinsamen Themenbereichen ganz unterschiedliche methodische Zugänge zu sprachlichen Daten gewählt worden sind und wie Methodik und Forschungsfrage stets aufeinander abgestimmt werden müssen. Die Vielzahl potenzieller Forschungsfragen führt so unweigerlich zu einer Diversität von Projekten und somit einer Heterogenität, die, so hoffen die Autor*innen, in OWIDplus greifbar wird.
In many countries of the world, perspectives on gender equality and racism have changed in recent decades. One result has been more attention being devoted to traces of androcentric and racist language in society. This also affects dictionaries. In lexicography there are discussions about whether or to what extent social asymmetries are inscribed in dictionaries and if this is still acceptable. The issue of the nature of description plays an important role in this discussion. If sexist usages are often found in language use, i.e. in the corpus data on which the dictionary is based, does the dictionary also have to show them? How is this, in turn, compatible with the normative power of dictionaries? Do dictionaries contribute to the perpetuation of gender stereotypes by showcasing them under the banner of descriptive principles? And what roles do lexicographers play in this process? The article deals with these questions on the basis of individual lexicographical examples and current discussions in the lexicographic and public community.
Annotated dataset consisting of personal designations found on websites of 42 German, Austrian, Swiss and South Tyrolean cities. Our goal is to re-evaluate the websites every year in order to see how the use of gender-fair language develops over time. The dataset contains coordinates for the creation of map material.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Wissenschaftlich basierte allgemeine Wörterbücher des Deutschen werden heute meist korpusbasiert erarbeitet, d. h. die in ihnen beschriebene Sprache wird vor der lexikografischen Beschreibung empirisch erforscht. Diese Korpora sind allerdings, wie die großen linguistischen Textsammlungen zum Deutschen allgemein, durch Zeitungstexte dominiert. Daher beruhen die in Wörterbüchern beschriebenen Kollokationen und typischen Verwendungskontexte zumindest teilweise auf dieser Textsorte. Wir untersuchen in unserem Beitrag anhand einer Fallstudie zu Mann und Frau, wie stark sich die Beschreibung solcher Kollokationssets ändern würde, wenn als Korpusgrundlage nicht Zeitungen, sondern Publikumszeitschriften oder belletristische Texte herangezogen würden und wie unterschiedlich demnach Geschlechterstereotype dargestellt würden. Damit diskutieren wir auch die Frage, ob Zeitungstexte in diesem Fall ein adäquates und vielseitiges Abbild des Gebrauchsstandards zeigen. Auf einer allgemeineren Ebene wird dadurch ein grundlegendes Problem korpuslinguistischer Forschungsarbeiten tangiert, nämlich die Frage, inwieweit durch Korpora überhaupt ein ‚objektives‘ Bild der sprachlichen Wirklichkeit gezeichnet werden kann.
Less than one percent of words would be affected by gender-inclusive language in German press texts
(2024)
Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.