Refine
Document Type
- Article (1)
- Conference Proceeding (1)
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Comparable Corpus (1)
- Deutsch (1)
- Grammatik (1)
- Kontrastive Grammatik (1)
- Korpus <Linguistik> (1)
- Linguistische Datenverarbeitung (1)
- Multilingual Corpus (1)
- POS-Tagging (1)
- Semantisches Netz (1)
- Wikipedia (1)
Publisher
To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.