A Functional Database Framework for Querying Very Large Multi-Layer Corpora
- Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
Author: | Roman SchneiderGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-39705 |
ISSN: | 0176-599X |
Parent Title (English): | Multilingual Resources and Multilingual Applications. Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011 |
Series (Serial Number): | Working Papers In Multilingualism Series B (96) |
Publisher: | Universität Hamburg |
Place of publication: | Hamburg |
Editor: | Hanna Hedemann, Thomas Schmidt, Kai Wörner |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2011 |
Date of Publication (online): | 2015/08/11 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | (Verlags)-Lektorat |
Tag: | corpus retrieval; corpus storage; database systems; multi-layer corpora |
GND Keyword: | Information Retrieval; Korpus <Linguistik> |
First Page: | 87 |
Last Page: | 92 |
DDC classes: | 400 Sprache |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Licence (German): | Urheberrechtlich geschützt |