A comparable Wikipedia corpus: from wiki syntax to POS tagged XML
- To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
Author: | Noah Bubenhofer, Stefanie Haupt, Horst SchwinnGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-51897 |
ISSN: | 0176-599X |
Parent Title (English): | [Arbeiten zur Mehrsprachigkeit / B] Arbeiten zur Mehrsprachigkeit = Working papers in multilingualism / Sonderforschungsbereich 538 Mehrsprachigkeit 538, Universität Hamburg |
Publisher: | Universität Hamburg |
Place of publication: | Hamburg |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2011 |
Date of Publication (online): | 2016/08/22 |
Tag: | Comparable Corpus; Multilingual Corpus; POS-Tagging; XSLT |
GND Keyword: | Kontrastive Grammatik; Korpus <Linguistik>; Wikipedia |
Issue: | 96 |
First Page: | 141 |
Last Page: | 144 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Licence (German): | Urheberrechtlich geschützt |