Refine
Document Type
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
- Datenmodell (4) (remove)
Publicationstate
- Veröffentlichungsversion (4) (remove)
Reviewstate
- Peer-Review (4)
We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.
We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.
Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers and field linguists. A variety of lexicon schemas have been developed, with goals ranging from computational lexicography (DATR) through archiving (LIFT, TEI) to standardization (LMF, FSR). A number of requirements for lexicon schemas are given. The lexicon schemas are introduced and compared to each other in terms of conversion and usability for this particular user group, using a common lexicon entry and providing examples for each schema under consideration. The formats are assessed and the final recommendation is given for the potential users, namely to request standard compliance from the developers of the tools used. This paper should foster a discussion between authors of standards, lexicographers and field linguists.