Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 11 of 1110
Back to Result List

Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size

  • Recently, it was demonstrated that generalized entropies of order α offer novel and important opportunities to quantify the similarity of symbol sequences where α is a free parameter. Varying this parameter makes it possible to magnify differences between different texts at specific scales of the corresponding word frequency spectrum. For the analysis of the statistical properties of natural languages, this is especially interesting, because textual data are characterized by Zipf’s law, i.e., there are very few word types that occur very often (e.g., function words expressing grammatical relationships) and many word types with a very low frequency (e.g., content words carrying most of the meaning of a sentence). Here, this approach is systematically and empirically studied by analyzing the lexical dynamics of the German weekly news magazine Der Spiegel (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017). We show that, analogous to most other measures in quantitative linguistics, similarity measures based on generalized entropies depend heavily on the sample size (i.e., text length). We argue that this makes it difficult to quantify lexical dynamics and language change and show that standard sampling approaches do not solve this problem. We discuss the consequences of the results for the statistical analysis of languages.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Alexander KoplenigORCiDGND, Sascha WolferORCiDGND, Carolin Müller-SpitzerORCiDGND
URN:urn:nbn:de:bsz:mh39-87970
DOI:https://doi.org/10.7910/DVN/OP9PRL
ISSN:1099-4300
Parent Title (English):Entropy
Publisher:MDPI
Place of publication:Basel
Document Type:Article
Language:English
Year of first Publication:2019
Date of Publication (online):2019/05/03
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Jensen-Shannon divergence; Zipf's law; generalized divergence; generalized entropy; sample size; text length
GND Keyword:Entropie; Sprachstatistik; Sprachwandel; Stichprobenumfang; Zipfsches Gesetz
Volume:21
Issue:5
Page Number:18
First Page:464
Note:
The publication of this article was partially funded by the Open Access Fund of the Leibniz Association.
Note:
Hinweis zur Zitationsweise:
Journal nutzt Artikelnummern anstelle einer fortlaufenden Paginierung, bei dem vorliegenden Artikel handelt es sich um die Nummer 464.
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Quantitative Linguistik
Program areas:Lexik
Licence (English):License LogoCreative Commons - Attribution 4.0 International