Refine
Document Type
- Article (1)
- Part of a Book (1)
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Keywords
- Comparable Corpus (1)
- Grammatik (1)
- Informationsmanagement (1)
- Kontrastive Grammatik (1)
- Korpus <Linguistik> (1)
- Multilingual Corpus (1)
- POS-Tagging (1)
- Terminologie (1)
- Terminologiemanagement (1)
- Wikipedia (1)
Publicationstate
- Preprint (1)
Publisher
- Springer (1)
- Universität Hamburg (1)
To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
Grammis is a web-based information system on German grammar, hosted by the Institute for the German Language (IDS). It is human-oriented and features different theoretical perspectives on grammar. Currently, the terminology component of grammis is being redesigned for this theoretical diversity to play a more prominent role in the data model. This also opens opportunities for implementing some machine-oriented features. In this paper, we present the re-design of both data model and knowledge base. We explore how the addition of machine-oriented features to the data model impacts the knowledge base; in particular, how this addition shifts some of the textual complexity into the data model. We show that our resource can easily be ported to a SKOS-XL representation, which makes it available for data science, knowledge-based NLP applications, and LOD in the context of digital humanities.