CMLC-3 / 3rd Workshop on Challenges in the Management of Large Corpora
Refine
Year of publication
- 2015 (1)
Document Type
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- no (1)
Keywords
- Annotation (1)
- Corpus annotation (1)
- Corpus linguistics (1)
- Corpus query language (1)
- Corpus technology (1)
- Datenbanksystem (1)
- Korpus <Linguistik> (1)
- Large corpora (1)
Publicationstate
Reviewstate
- Peer-Review (1)
Publisher
The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.