Volltext-Downloads (blau) und Frontdoor-Views (grau)

A comparable Wikipedia corpus: from wiki syntax to POS tagged XML

  • To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Noah Bubenhofer, Stefanie Haupt, Horst Schwinn
URN:urn:nbn:de:bsz:mh39-51897
ISSN:0176-599X
Parent Title (English):[Arbeiten zur Mehrsprachigkeit / B] Arbeiten zur Mehrsprachigkeit = Working papers in multilingualism / Sonderforschungsbereich 538 Mehrsprachigkeit 538, Universit├Ąt Hamburg
Publisher:Universit├Ąt Hamburg
Place of publication:Hamburg
Document Type:Article
Language:English
Year of first Publication:2011
Date of Publication (online):2016/08/22
Tag:Comparable Corpus; Multilingual Corpus; POS-Tagging; XSLT
GND Keyword:Kontrastive Grammatik; Korpus <Linguistik>; Wikipedia
Issue:96
First Page:141
Last Page:144
Dewey Decimal Classification:400 Sprache / 400 Sprache, Linguistik
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG