Korpuslinguistik
Refine
Year of publication
- 2013 (6) (remove)
Document Type
- Part of a Book (6) (remove)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Keywords
- Korpus <Linguistik> (4)
- Deutsch (2)
- Korpusanalyseplattform (KorAP) (2)
- Automatische Sprachanalyse (1)
- Computer-Mediated Communication (1)
- Corpus Linguistics (1)
- Decision Trees (1)
- Genitive Classification (1)
- Grammar (1)
- Korpuslinguistik (1)
Publicationstate
- Veröffentlichungsversion (4)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer
linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.