An API for discourse-level access to XML-encoded corpora
- We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.
Author: | Mark-Christoph MüllerORCiDGND, Michael StrubeGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-111602 |
URL: | http://www.lrec-conf.org/proceedings/lrec2002/pdf/296.pdf |
URL: | https://aclanthology.org/L02-1296/ |
Parent Title (English): | Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). May 29-31, 2002, Las Palmas, Canary Islands, Spain |
Publisher: | European Language Resources Association (ELRA) |
Place of publication: | Paris |
Editor: | Manuel González Rodríguez, Carmen Paz Suarez Araujo |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2002 |
Date of Publication (online): | 2022/07/26 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | XML; corpus exploitation; discourse processing; reusability; standardization |
GND Keyword: | API; Datenmodell; Korpus <Linguistik>; Natürliche Sprache; Softwarewiederverwendung; Vereinheitlichung; XML |
First Page: | 26 |
Last Page: | 30 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (English): | ![]() |