Volltext-Downloads (blau) und Frontdoor-Views (grau)

Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis

  • Using the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Alexander KoplenigGND
URN:urn:nbn:de:bsz:mh39-73491
DOI:https://doi.org/10.1515/cllt-2014-0049
ISSN:1613-7035
Parent Title (English):Corpus linguistics and linguistic theory
Publisher:de Gruyter
Place of publication:Berlin [u.a.]
Document Type:Article
Language:English
Year of first Publication:2018
Date of Publication (online):2018/04/16
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Google Ngram Corpora; Zipf–Mandelbrot law; Zipf’s law; diachronic corpus linguistics; lexical richness; noun–pronoun ratio; power law; syntactic complexity; time series analysis; type token ratio; vocabulary size
Volume:14
Issue:1
First Page:1
Last Page:34
Note:
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich.

This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.
Dewey Decimal Classification:400 Sprache / 400 Sprache, Linguistik
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):License LogoCreative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland