Refine
Year of publication
- 2002 (2) (remove)
Document Type
Language
- English (2)
Has Fulltext
- yes (2) (remove)
Is part of the Bibliography
- no (2)
Keywords
- API (1)
- Coreference (1)
- Datenmodell (1)
- Interrelated document grammars (1)
- Korpus <Linguistik> (1)
- Multilingual corpus (1)
- Multiple annotations (1)
- Natürliche Sprache (1)
- Softwarewiederverwendung (1)
- Vereinheitlichung (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (1)
- Peer-Review (1)
Publisher
- European Language Resources Association (ELRA) (2) (remove)
We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.
Co-reference annotation and resources: a multilingual corpus of typologically diverse languages
(2002)
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process.