Refine
Document Type
- Article (8) (remove)
Language
- German (8) (remove)
Has Fulltext
- yes (8)
Keywords
- Korpus <Linguistik> (5)
- Deutsch (4)
- Deutsches Referenzkorpus (DeReKo) (4)
- Computerlinguistik (3)
- Institut für Deutsche Sprache <Mannheim> (3)
- Studium (2)
- Dereko (1)
- E-Learning (1)
- Englisch (1)
- Linguistische Datenverarbeitung (1)
Publicationstate
Reviewstate
- Peer-Review (3)
- (Verlags)-Lektorat (1)
Publisher
- de Gruyter (3)
- GSCL (2)
- Institut für Deutsche Sprache (1)
Wikipedia is a valuable resource, useful as a lingustic corpus or a dataset for many kinds of research. We built corpora from Wikipedia articles and talk pages in the I5 format, a TEI customisation used in the German Reference Corpus (Deutsches Referenzkorpus - DeReKo). Our approach is a two-stage conversion combining parsing using the Sweble parser, and transformation using XSLT stylesheets. The conversion approach is able to successfully generate rich and valid corpora regardless of languages. We also introduce a method to segment user contributions in talk pages into postings.
Editorial
(2011)
This paper describes an approach to modelling a general-language wordnet, GermaNet, and a domain-specific wordnet, TermNet, in the web ontology language OWL. While the modelling process for GermaNet adopts relevant recommendations with respect to the English Princeton WordNet, for Term-Net an alternative modelling concept is developed that considers the special characteristics of domain-specific terminologies. We present a proposal for linking a general-language wordnet and a terminological wordnet within the framework of OWL and on this basis discuss problems and alternative modelling approaches.