Volltext-Downloads (blau) und Frontdoor-Views (grau)

Ziggurat: A new data model and indexing format for large annotated text corpora

  • The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Stefan Evert, Andrew Hardie
Parent Title (English):Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), Lancaster, 20 July 2015
Publisher:Institut für Deutsche Sprache
Place of publication:Mannheim
Editor:Piotr Bański, Hanno Biber, Evelyn Breiteneder, Marc Kupietz, Harald Lüngen, Andreas Witt
Document Type:Conference Proceeding
Year of first Publication:2015
Date of Publication (online):2015/07/02
Tag:Corpus annotation; Corpus linguistics; Corpus query language; Corpus technology; Large corpora
GND Keyword:Annotation; Datenbanksystem; Korpus <Linguistik>
First Page:21
Last Page:27
Dewey Decimal Classification:400 Sprache / 410 Linguistik
Conferences, Workshops:CMLC-3 / 3rd Workshop on Challenges in the Management of Large Corpora
Open Access?:Ja
Licence (German):License LogoCreative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland