Volltext-Downloads (blau) und Frontdoor-Views (grau)

SusTEInability of linguistic resources through feature structures

  • This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Andreas WittORCiDGND, Georg Rehm, Erhard Hinrichs, Timm Lehmberg, Jens Stegmann
URN:urn:nbn:de:bsz:mh39-44901
ISSN:1477-4615
Parent Title (English):Literary and Linguistic Computing
Publisher:Oxford University Press
Place of publication:Oxford
Document Type:Article
Language:English
Year of first Publication:2009
Date of Publication (online):2015/12/16
Publicationstate:Postprint
Reviewstate:(Verlags)-Lektorat
GND Keyword:Annotation; Programmiersprache; Text Encoding Initiative (TEI)
Volume:24
Issue:3
First Page:363
Last Page:372
Note:
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich.

This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.
Dewey Decimal Classification:400 Sprache / 410 Linguistik
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Computerlinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG