Refine
Year of publication
- 2017 (4) (remove)
Document Type
- Article (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Keywords
- Sprachstatistik (3)
- Korpus <Linguistik> (2)
- BNC (1)
- COHA (1)
- Deutsch (1)
- Englisch (1)
- Google Books Ngram corpora (1)
- Informationsstruktur (1)
- Objektsatz (1)
- Semasiologie (1)
Publicationstate
Reviewstate
- Peer-Review (4) (remove)
Publisher
- De Gruyter (1)
- Erich Schmidt (1)
- Routledge, Taylor & Francis (1)
Thema des Aufsatzes ist die Komplementsatzdistribution im Deutschen. Überprüft wird die These, dass die lexikalisch-semantischen Eigenschaften der einbettenden Verben, dabei v.a. ihre Kontrolleigenschaften sowie ihre temporale und modale Spezifikation, dafür verantwortlich sind, ob bevorzugt ein dass-Satz oder ein zu-Infinitiv selegiert wird. Eine korpuslinguistische Überprüfung dieser These zeigt, dass die genannten drei Kriterien in unterschiedlicher Weise von Bedeutung für die Komplementselektion sind. Als bedeutendster Faktor erweist sich das Kontrollkriterium. Ein weiteres wichtiges Ergebnis der Untersuchung ist, dass die Komplementselektion dem Prinzip der argumentstrukturellen Trägheit entspricht: Verben neigen dazu, als Essenz memorisierter Gebrauchsspuren eine graduelle Präferenz für ein bestimmtes Komplementationsmuster zu entwickeln.
In the first volume of Corpus Linguistics and Linguistic Theory, Gries (2005. Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1(2). doi:10.1515/ cllt.2005.1.2.277. http://www.degruyter.com/view/j/cllt.2005.1.issue-2/cllt.2005. 1.2.277/cllt.2005.1.2.277.xml: 285) asked whether corpus linguists should abandon null-hypothesis significance testing. In this paper, I want to revive this discussion by defending the argument that the assumptions that allow inferences about a given population – in this case about the studied languages – based on results observed in a sample – in this case a collection of naturally occurring language data – are not fulfilled. As a consequence, corpus linguists should indeed abandon null-hypothesis significance testing.
In this paper, an exploratory data-driven method is presented that extracts word-types from diachronic corpora that have undergone the most pronounced change in frequency of occurrence in a given period of time. Combined with statistical methods from time series analysis, the method is able to find meaningful patterns and relationships in diachronic corpora, an idea that is still uncommon in linguistics. This indicates that the approach can facilitate an improved understanding of diachronic processes.
Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that–despite differences in the way information is expressed–there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.