KoGra-DB: Using MapReduce for language corpora

Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.

Metadaten
Author:	Roman Schneider GND
URN:	urn:nbn:de:bsz:mh39-70363
ISBN:	978-3-88579-614-5
Parent Title (German):	Informatik 2013. Informatik angepasst an Mensch, Organisation und Umwelt
Series (Serial Number):	Lecture Notes in Informatics (220)
Publisher:	Köllen
Place of publication:	Bonn-Buschdorf
Editor:	Matthias Horbach
Document Type:	Part of a Book
Language:	English
Year of first Publication:	2013
Date of Publication (online):	2018/02/02
Publicationstate:	Veröffentlichungsversion
Reviewstate:	(Verlags)-Lektorat
Tag:	Korpusanalyseplattform (KorAP)
GND Keyword:	Automatische Sprachanalyse; Korpus <Linguistik>
First Page:	140
Last Page:	142
DDC classes:	400 Sprache / 400 Sprache, Linguistik
Open Access?:	ja
Leibniz-Classification:	Sprache, Linguistik
Linguistics-Classification:	Korpuslinguistik
Licence (German):	Urheberrechtlich geschützt

Open Access