Refine
Document Type
- Article (5) (remove)
Has Fulltext
- yes (5)
Keywords
- Deutsch (4)
- Korpus <Linguistik> (3)
- Rezension (2)
- Anglizismus (1)
- Comparable Corpus (1)
- Diskursanalyse (1)
- Geschichte 2009 (1)
- Kontrastive Grammatik (1)
- Korpuslinguistik (1)
- Markiertheit (1)
Reviewstate
- Peer-Review (1)
Publisher
To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
The variation of the strong genitive marker of the singular noun has been treated by diverse accounts. Still there is a consensus that it is to a large extent systematic but can be approached appropriately only if many heterogeneous factors are taken into account. Over thirty variables influencing this variation have been proposed. However, it is actually unclear how effective they can be, and above all, how they interact. In this paper, the potential influencing variables are evaluated statistically in a machine learning approach and modelled in decision trees in order to predict the genitive marking variants. Working with decision trees based exclusively on statistically significant data enables us to determine what combination of factors is decisive in the choice of a marking variant of a given noun. Consequently the variation factors can be assessed with respect to their explanatory power for corpus data and put in a hierarchized order.