Refine
Year of publication
- 2015 (6) (remove)
Document Type
- Article (2)
- Part of a Book (2)
- Preprint (1)
- Working Paper (1)
Has Fulltext
- yes (6)
Is part of the Bibliography
- yes (6)
Keywords
- Wörterbuch (2)
- time series analysis (2)
- Autocorrelated errors (1)
- BNC (1)
- Benutzer (1)
- Benutzung (1)
- COHA (1)
- Computerunterstützte Lexikographie (1)
- Determinator (1)
- Deutsch (1)
Publicationstate
- Preprint (2)
- Veröffentlichungsversion (2)
- Postprint (1)
Reviewstate
- Peer-Review (2)
- Verlagslektorat (1)
Publisher
- Palgrave Macmillan (1)
- de Gruyter (1)
In this paper, a method for measuring synchronic corpus (dis-)similarity put forward by Kilgarriff (2001) is adapted and extended to identify trends and correlated changes in diachronic text data, using the Corpus of Historical American English (Davies 2010a) and the Google Ngram Corpora (Michel et al. 2010a). This paper shows that this fully data-driven method, which extracts word types that have undergone the most pronounced change in frequency in a given period of time, is computationally very cheap and that it allows interpretations of diachronic trends that are both intuitively plausible and motivated from the perspective of information theory. Furthermore, it demonstrates that the method is able to identify correlated linguistic changes and diachronic shifts that can be linked to historical events. Finally, it can help to improve diachronic POS tagging and complement existing NLP approaches. This indicates that the approach can facilitate an improved understanding of diachronic processes in language change.
Frimer et al. (2015) claim that there is a linear relationship between the level of prosocial language and the level of public disapproval of US Congress. A re-analysis demonstrates that this relationship is the result of a misspecified model that does not account for first-order autocorrelated disturbances. A Stata script to reproduce all presented results is available as an appendix.
Metalinguistic awareness of standard vs standard usage. The case of determiners in spoken German
(2015)
We present studies using the 2013 log files from the German version of Wiktionary. We investigate several lexicographically relevant variables and their effect on look-up frequency: Corpus frequency of the headword seems to have a strong effect on the number of visits to a Wiktionary entry. We then consider the question of whether polysemic words are looked up more often than monosemic ones. Here, we also have to take into account that polysemic words are more frequent in most languages. Finally, we present a technique to investigate the time-course of look-up behaviour for specific entries. We exemplify the method by investigating influences of (temporary) social relevance of specific headwords.
This article presents empirical findings about what criteria make for a good online dictionary, using data on expectations and demands collected in an online questionnaire (N~684), complemented by additional results from a second questionnaire (N-390) which looked more closely at whether respondents had differentiated views on individual aspects of the criteria rated in the first study. Our results show that the classical criteria of reference books (such as reliability and clarity) were rated highest by our participants, whereas the unique characteristics of online dictionaries (such as multimedia and adaptability) were rated and ranked as (partly) unimportant. To verify whether or not the poor ratings of these innovative features were a result of the fact that our subjects are unfamiliar with online dictionaries incorporating such features, we incorporated an experiment into the second study. Our results revealed a learning effect: participants in the learning-effect condition, i.e. respondents who were first presented with examples of possible innovative features of online dictionaries, judged adaptability and multimedia to be more useful than participants who were not given that information. Thus, our data point to the conclusion that developing innovative features is worthwhile but that it should be borne in mind that users can only be persuaded of their benefits gradually. In addition, we present data about questions relating to the design of online dictionaries.
Using the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.