Feature-based encoding and querying language resources with character semantics
- In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.
Author: | Baden HughesORCiD, Dafydd GibbonORCiDGND, Thorsten TrippelORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-126238 |
URL: | http://www.lrec-conf.org/proceedings/lrec2006/pdf/493_pdf.pdf |
URL: | https://aclanthology.org/L06-1296/ |
Parent Title (English): | Proceedings of the fifth international conference on language resources and evaluation (LREC’06). 22 May - 28 May 2006, Genoa, Italy |
Publisher: | European Language Resources Association (ELRA) |
Place of publication: | Paris |
Editor: | Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2006 |
Date of Publication (online): | 2024/04/10 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | ISO-DIS-24610-1; character semantics; feature structures; ontology; phonetics; securing interpretability |
GND Keyword: | Archivierung; International Conference on Language Resources and Evaluation (5. : 2006 : Genua); Metadaten; Ontologie <Wissensverarbeitung>; Phonetik; Sprachdaten; XML |
First Page: | 939 |
Last Page: | 944 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (English): | Creative Commons - Attribution-NonCommercial-ShareAlike 3.0 Unported |