TY - CHAP U1 - Konferenzveröffentlichung A1 - Schneider, Roman ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Doğan, Mehmet Uğur ED - Maegaard, Bente ED - Mariani, Joseph ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - Evaluating DBMS-based Access Strategies to Very Large Multi-layer Corpora T2 - Proceedings of the LREC-12 Workshop on Challenges in the Management of Large Corpora. Istanbul, Turkey, May 2012 N2 - Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several storage models and indexing variants in two multi-processor/multi-core environments, focusing on prototypical linguistic querying scenarios. Our aim is to reveal modeling and querying tendencies – rather than absolute benchmark results – when using a relational database management system (RDBMS) and MapReduce for natural language corpus retrieval. Based on these findings, we are going to improve our approach for the efficient exploitation of very large corpora, combining advantages of state-of-the-art database systems with decomposition/parallelization strategies. Our reference implementation uses the German DeReKo reference corpus with currently more than 4 billion word forms, various multi-layer linguistic annotations, and several types of text-specific metadata. The proposed strategy is language-independent and adaptable to large-scale multilingual corpora. KW - Very Large Corpora KW - Multi-layer Annotation KW - Linguistic Retrieval KW - Database Management Systems KW - Concurrency Y1 - 2012 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-48124 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-48124 SN - 978-2-9517408-7-7 SB - 978-2-9517408-7-7 SP - 35 EP - 48 PB - European Language Resources Association (ELRA) CY - Paris ER -