Volltext-Downloads (blau) und Frontdoor-Views (grau)

Population Size Predicts Lexical Diversity, but so Does the Mean Sea Level – Why It Is Important to Correctly Account for the Structure of Temporal Data

  • In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Alexander KoplenigORCiDGND, Carolin Müller-SpitzerORCiDGND
Parent Title (English):PLoS ONE
Editor:Karen Lidzba
Document Type:Article
Year of first Publication:2016
Date of Publication (online):2016/03/04
Tag:Autokorrelation; Google Ngram; Pearson Korrelation; Quantitative Linguistik; Type-Token Verhältnis
GND Keyword:Sprachstatistik
Page Number:14
First Page:e0150771
The publication of this article was funded by the Open Access fund of the Leibniz Association
DDC classes:400 Sprache / 410 Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Licence (English):License LogoCreative Commons - Attribution 4.0 International