OPUS 4 | Quantitative Linguistik

Quantitative Linguistik

7 search hits

1 to 7

Sort by

Counting languages: how to do it and what to avoid. A German perspective (2020)

Adler, Astrid

The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.

Tracking and analyzing recent developments in German-language online press in the face of the coronavirus crisis: cOWIDplus Analysis and cOWIDplus Viewer (2020)

Wolfer, Sascha ; Koplenig, Alexander ; Michaelis, Frank ; Müller-Spitzer, Carolin

The coronavirus pandemic may be the largest crisis the world has had to face since World War II. It does not come as a surprise that it is also having an impact on language as our primary communication tool. In this short paper, we present three inter-connected resources that are designed to capture and illustrate these effects on a subset of the German language: An RSS corpus of German-language newsfeeds (with freely available untruncated frequency lists), a continuously updated HTML page tracking the diversity of the vocabulary in the RSS corpus and a Shiny web application that enables other researchers and the broader public to explore the corpus in terms of basic frequencies.

cOWIDplus Viewer: Sprachliche Spuren der Corona-Krise in deutschen Online-Nachrichtenmeldungen. Explorieren Sie selbst! (2020)

Müller-Spitzer, Carolin ; Wolfer, Sascha ; Koplenig, Alexander ; Michaelis, Frank

Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size (2020)

Koplenig, Alexander ; Wolfer, Sascha ; Müller-Spitzer, Carolin

Recently, it was demonstrated that generalized entropies of order α offer novel and important opportunities to quantify the similarity of symbol sequences where α is a free parameter. Varying this parameter makes it possible to magnify differences between different texts at specific scales of the corresponding word frequency spectrum. For the analysis of the statistical properties of natural languages, this is especially interesting, because textual data are characterized by Zipf’s law, i.e., there are very few word types that occur very often (e.g., function words expressing grammatical relationships) and many word types with a very low frequency (e.g., content words carrying most of the meaning of a sentence). Here, this approach is systematically and empirically studied by analyzing the lexical dynamics of the German weekly news magazine Der Spiegel (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017). We show that, analogous to most other measures in quantitative linguistics, similarity measures based on generalized entropies depend heavily on the sample size (i.e., text length). We argue that this makes it difficult to quantify lexical dynamics and language change and show that standard sampling approaches do not solve this problem. We discuss the consequences of the results for the statistical analysis of languages.

Überlegungen zur sprachstandbezogenen Relativierung von Wortschätzen. Ein theoretischer Rahmen und eine kleine empirische Studie (2020)

Perkuhn, Rainer

cOWIDplus Analyse: Wie sehr schränkt die Corona-Krise das Vokabular deutschsprachiger Online-Presse ein? (2020)

Wolfer, Sascha ; Koplenig, Alexander ; Michaelis, Frank ; Müller-Spitzer, Carolin

cOWIDplus Analyse ist eine kontinuierlich aktualisierte Ressource zu der Frage, ob und wie stark sich der Wortschatz ausgewählter deutscher Online-Pressemeldungen während der Corona-Pandemie systematisch einschränkt und ob bzw. wann sich das Vokabular nach der Krise wieder ausweitet. In diesem Artikel erläutern die Autor*innen die hinter der Ressource stehende Forschungsfrage, die zugrunde gelegten Daten, die Methode sowie die bisherigen Ergebnisse.

cOWIDplus Viewer: Sprachliche Spuren der Corona-Krise in deutschen Online-Nachrichtenmeldungen. Explorieren Sie selbst! (2020)

Müller-Spitzer, Carolin ; Wolfer, Sascha ; Koplenig, Alexander ; Michaelis, Frank

1 to 7

Open Access

Quantitative Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

7 search hits