Refine
Year of publication
- 2016 (3) (remove)
Document Type
- Article (2)
- Doctoral Thesis (1)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Keywords
- Autokorrelation (1)
- Deutsch (1)
- Forschungsmethode (1)
- Gesprochene Sprache (1)
- Google Ngram (1)
- Korpus <Linguistik> (1)
- Pearson Korrelation (1)
- Quantitative Linguistik (1)
- Schriftsprache (1)
- Sprachgeschichte (1)
Publicationstate
- Veröffentlichungsversion (3) (remove)
Reviewstate
Publisher
- Buske (1)
- Universität Mannheim (1)
In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.
This paper explores speakers’ notions of the situational appropriacy of linguistic variants. We conducted a web-based survey in which we collected ratings of the appropriacy of variants of linguistic variables in spoken German. A range of quantitative methods (cluster analysis, factor analysis and various forms of visualization techniques) is applied in order to analyze metalinguistic awareness and the differences in the evaluation of written vs. spoken stimuli. First, our data show that speakers’ ratings of the appropriacy of linguistic variants vary reliably with two rough clusters representing formal and informal speech situations and genres. The findings confirm that speakers adhere to a notion of spoken standard German which takes genre and register-related variation into account. Secondly, our analysis reveals a written language bias: metalinguistic awareness is strongly influenced by the physical mode of the presentation of linguistic items (spoken vs. written).
This thesis consists of the following three papers that all have been published in international peer-reviewed journals:
Chapter 3: Koplenig, Alexander (2015c). The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv037]
Chapter 4: Koplenig, Alexander (2015b). Why the quantitative analysis of dia-chronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv030]
Chapter 5: Koplenig, Alexander (2015a). Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis. Published in: Corpus Linguistics and Linguistic Theory. Berlin/Boston: de Gruyter. [doi:10.1515/cllt-2014-0049]
Chapter 1 introduces the topic by describing and discussing several basic concepts relevant to the statistical analysis of corpus linguistic data. Chapter 2 presents a method to analyze diachronic corpus data and a summary of the three publications. Chapters 3 to 5 each represent one of the three publications. All papers are printed in this thesis with the permission of the publishers.