Refine
Document Type
- Article (2)
- Part of a Book (1)
- Conference Proceeding (1)
Has Fulltext
- yes (4)
Keywords
- Deutsch (3)
- Korpus <Linguistik> (3)
- Deutscher Referenzkorpus (DeReKo) (1)
- Freie Variation (1)
- Fremdsprache (1)
- Fugenelement (1)
- Grammatik (1)
- Historische Sprachwissenschaft (1)
- Komposition <Wortbildung> (1)
- Koreanisch (1)
Publicationstate
Reviewstate
- Peer-Review (4) (remove)
In recent years, the availability of large annotated and searchable corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel and interesting work using corpus-based methods to study the grammar of natural languages. However, a look at relevant current research on the grammar of the Germanic, Romance, and Slavic languages reveals a variety of different theoretical approaches and empirical foci, which can be traced back to different philological and linguistic traditions. Still, this current state of affairs should not be seen as an obstacle but as an ideal basis for a fruitful exchange of ideas between different research paradigms.
The variation of the strong genitive marker of the singular noun has been treated by diverse accounts. Still there is a consensus that it is to a large extent systematic but can be approached appropriately only if many heterogeneous factors are taken into account. Over thirty variables influencing this variation have been proposed. However, it is actually unclear how effective they can be, and above all, how they interact. In this paper, the potential influencing variables are evaluated statistically in a machine learning approach and modelled in decision trees in order to predict the genitive marking variants. Working with decision trees based exclusively on statistically significant data enables us to determine what combination of factors is decisive in the choice of a marking variant of a given noun. Consequently the variation factors can be assessed with respect to their explanatory power for corpus data and put in a hierarchized order.