Volltext-Downloads (blau) und Frontdoor-Views (grau)

Feature-based encoding and querying language resources with character semantics

  • In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.

Download full text files

Export metadata

Additional Services

Search Google Scholar


Author:Baden HughesORCiD, Dafydd GibbonORCiDGND, Thorsten TrippelORCiDGND
Parent Title (English):Proceedings of the fifth international conference on language resources and evaluation (LREC’06). 22 May - 28 May 2006, Genoa, Italy
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Document Type:Conference Proceeding
Year of first Publication:2006
Date of Publication (online):2024/04/10
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Tag:ISO-DIS-24610-1; character semantics; feature structures; ontology; phonetics; securing interpretability
GND Keyword:Archivierung; International Conference on Language Resources and Evaluation (5. : 2006 : Genua); Metadaten; Ontologie <Wissensverarbeitung>; Phonetik; Sprachdaten; XML
First Page:939
Last Page:944
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (English):License LogoCreative Commons - Attribution-NonCommercial-ShareAlike 3.0 Unported