- search hit 1 of 1
Ziggurat: A new data model and indexing format for large annotated text corpora
- The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.
Author: | Stefan Evert, Andrew Hardie |
---|---|
URN: | urn:nbn:de:bsz:mh39-38335 |
Parent Title (English): | Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), Lancaster, 20 July 2015 |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Hanno Biber, Evelyn Breiteneder, Marc Kupietz, Harald Lüngen, Andreas Witt |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2015 |
Date of Publication (online): | 2015/07/02 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Corpus annotation; Corpus linguistics; Corpus query language; Corpus technology; Large corpora |
GND Keyword: | Annotation; Datenbanksystem; Korpus <Linguistik> |
First Page: | 21 |
Last Page: | 27 |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-3 / 3rd Workshop on Challenges in the Management of Large Corpora |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |