Release of the MySQL based implementation of the CTS protocol
- In a project called "A Library of a Billion Words" we needed an implementation of the CTS protocol that is capable of handling a text collection containing at least 1 billion words. Because the existing solutions did not work for this scale or were still in development I started an implementation of the CTS protocol using methods that MySQL provides. Last year we published a paper that introduced a prototype with the core functionalities without being compliant with the specifications of CTS (Tiepmar et al., 2013). The purpose of this paper is to describe and evaluate the MySQL based implementation now that it is fulfilling the specifications version 5.0 rc.1 and mark it as finished and ready to use. Further information, online instances of CTS for all described datasets and binaries can be accessed via the projects website.
Author: | Jochen Tiepmar |
---|---|
URN: | urn:nbn:de:bsz:mh39-38374 |
Parent Title (English): | Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), Lancaster, 20 July 2015 |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Hanno Biber, Evelyn Breiteneder, Marc Kupietz, Harald Lüngen, Andreas Witt |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2015 |
Date of Publication (online): | 2015/07/02 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | CTS; Canonical text services; Text retrieval; mysql |
GND Keyword: | Information Retrieval; MySQL |
First Page: | 35 |
Last Page: | 43 |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-3 / 3rd Workshop on Challenges in the Management of Large Corpora |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |