- search hit 1 of 1
Why the quantitative analysis of diachronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions
- Recently, a claim was made, on the basis of the German Google Books 1-gram corpus (Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books. Science 2010; 331: 176–82), that there was a linear relationship between six non-technical non-Nazi words and three ‘explicitly Nazi words’ in times of World War II (Caruana-Galizia. 2015. Politics and the German language: Testing Orwell’s hypothesis using the Google N-Gram corpus. Digital Scholarship in the Humanities [Online]. http://dsh.oxfordjournals.org/cgi/doi/10.1093/llc/fqv011 (accessed 15 April 2015)). Here, I try to show that apparent relationships like this are the result of misspecified models that do not take into account the temporal aspect of time-series data. The main point of this article is to demonstrate why such analyses run the risk of incorrect statistical inference, where potential effects are both meaningless and can potentially lead to wrong conclusions.
Author: | Alexander KoplenigORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-43679 |
DOI: | https://doi.org/0.1093/llc/fqv030 |
ISSN: | 2055-768X |
Parent Title (English): | Digital Scholarship in the Humanities |
Publisher: | Oxford University Press (OUP) |
Place of publication: | Oxford |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2015/08/13 |
Publicationstate: | Postprint |
Reviewstate: | Peer-Revied |
GND Keyword: | Internet; Korpus <Linguistik>; Sprachstatistik |
Volume: | 32 (2017) |
Issue: | 1 |
First Page: | 159 |
Last Page: | 168 |
Note: | Advance Access published August 13, 2015 Print version: Digital Scholarship Humanities (2017) 32 (1): 159-168. |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Program areas: | Lexik |
Licence (German): | Urheberrechtlich geschützt |