Volltext-Downloads (blau) und Frontdoor-Views (grau)

An API for discourse-level access to XML-encoded corpora

  • We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.

Export metadata

Additional Services

Search Google Scholar


Author:Mark-Christoph MüllerORCiDGND, Michael StrubeGND
Parent Title (English):Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). May 29-31, 2002, Las Palmas, Canary Islands, Spain
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:Manuel González Rodríguez, Carmen Paz Suarez Araujo
Document Type:Conference Proceeding
Year of first Publication:2002
Date of Publication (online):2022/07/26
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Tag:XML; corpus exploitation; discourse processing; reusability; standardization
GND Keyword:API; Datenmodell; Korpus <Linguistik>; Natürliche Sprache; Softwarewiederverwendung; Vereinheitlichung; XML
First Page:26
Last Page:30
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (English):License LogoCreative Commons - Attribution-NonCommercial-ShareAlike 3.0 Unported