Volltext-Downloads (blau) und Frontdoor-Views (grau)

GenitivDB - a corpus-generated database for German genitive classification

  • We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is a comprehensive database of German genitive formations, enriched with a broad range of intra- und extralinguistic metadata. It can be used for the notoriously controversial classification and prediction of genitive endings (short endings, long endings, zero-marker). We also evaluate the main factors influencing the use of specific endings. To get a general idea about a factor’s influences and its side effects, we calculate chi-square-tests and visualize the residuals with an association plot. The results are evaluated against a gold standard by implementing tree-based machine learning algorithms. For the statistical analysis, we applied the supervised LMT Logistic Model Trees algorithm, using the WEKA software. We intend to use this gold standard to evaluate GenitivDB, as well as to explore methodologies for a predictive genitive model.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Roman SchneiderGND
Url of the author's Homepage:http://www1.ids-mannheim.de/gra/personal/schneider.html
Parent Title (German):LREC 2014, ninth international conference on language resources and evaluation. May 26-31, 2014, Reykjavik, Iceland
Publisher:European Language Resources Association (ELRA)
Editor:Nicoletta Calzolari
Document Type:Conference Proceeding
Year of first Publication:2014
Date of Publication (online):2014/11/20
Contributing Corporation:European Language Resources Association
Tag:Grammar; MLP; Metadata
GND Keyword:Deutsch; Genitiv; Korpus <Linguistik>
First Page:988
Last Page:994
DDC classes:400 Sprache / 430 Deutsch
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Licence (German):License LogoUrheberrechtlich geschützt