Korpuslinguistik
Refine
Year of publication
- 2013 (16) (remove)
Document Type
- Part of a Book (6)
- Conference Proceeding (5)
- Article (4)
- Book (1)
Has Fulltext
- yes (16)
Is part of the Bibliography
- no (16)
Keywords
- Korpus <Linguistik> (12)
- Deutsch (7)
- Korpusanalyseplattform (KorAP) (2)
- Korpuslinguistik (2)
- XML (2)
- Automatische Sprachanalyse (1)
- Computer-Mediated Communication (1)
- Corpus Linguistics (1)
- Datenbank (1)
- Decision Trees (1)
Publicationstate
- Veröffentlichungsversion (8)
- Zweitveröffentlichung (2)
- Postprint (1)
Reviewstate
- (Verlags)-Lektorat (10)
- Peer-Review (1)
- Zweitveröffentlichung (1)
Publisher
- Institut für Deutsche Sprache (2)
- Narr (2)
- UCREL (2)
- ACM (1)
- GSCL (1)
- Köllen (1)
- Springer (1)
- Stutz (1)
- Université de Strasbourg (1)
- Uniwersytet im. Adama Mickiewicza w Poznaniu (1)
Extending the possibilities for collaborative work with TEI/XML through the usage of a wiki system
(2013)
This paper presents and discusses an integrated project-specific working environment for editing TEI/XML-files and linking entities of interest to a dedicated wiki system. This working environment has been specifically tailored to the workflow in our interdisciplinary digital humanities project GeoBib. It addresses some challenges that arose while working with person-related data and geographical references in a growing collection of TEI/XML-files. While our current solution provides some essential benefits, we also discuss several critical issues and challenges that remain.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer
linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
Investigating the history of a language depends on fragmentary sources, but electronic corpora offer the possibility of alleviating the problem of ‘bad data’. However they cannot overcome it totally, and crucial questions thus arise of the optimal architecture for such a corpus, the problem of how representative even a large corpus can be of actual language use at a particular time, and how a historical corpus can best be annotated and provided with tools to maximize its usefulness as a resource for future researchers. Immense strides have been made in recent years in addressing these questions, with exciting new methods and technological advances. The papers in this volume, which were presented at a conference on New Methods in Historical Corpora (Manchester 2011), exemplify the range of these developments in investigating the diachrony of languages as distinct as English, German, Latin, Spanish, French and Slovene and developing appropriate tools for the analysis of historical corpora in these languages.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
Editorial
(2013)
Igel is a small XQuery-based web application for examining a collection of document grammars; in particular, for comparing related document grammars to get a better overview of their differences and similarities. In its initial form, Igel reads only DTDs and provides only simple lists of constructs in them (elements, attributes, notations, parameter entities). Our continuing work is aimed at making Igel provide more sophisticated and useful information about document grammars and building the application into a useful tool for the analysis (and the maintenance!) of families of related document grammars