Lightweight grammatical annotation in the TEI: new perspectives
- In mid-2017, as part of our activities within the TEI Special Interest Group for Linguists (LingSIG), we submitted to the TEI Technical Council a proposal for a new attribute class that would gather attributes facilitating simple token-level linguistic annotation. With this proposal, we addressed community feedback complaining about the lack of a specific tagset for lightweight linguistic annotation within the TEI. Apart from @lemma and @lemmaRef, up till now TEI encoders could only resort to using the generic attribute @ana for inline linguistic annotation, or to the quite complex system of feature structures for robust linguistic annotation, the latter requiring relatively complex processing even for the most basic types of linguistic features. As a result, there now exists a small set of basic descriptive devices which have been made available at the cost of only very small changes to the TEI tagset. The merit of a predefined TEI tagset for lightweight linguistic annotation is the homogeneity of tagging and thus better interoperability of simple linguistic resources encoded in the TEI. The present paper introduces the new attributes, makes a case for one more addition, and presents the advantages of the new system over the legacy TEI solutions.
Author: | Piotr BańskiORCiDGND, Susanne Haaf, Martin Mueller |
---|---|
URN: | urn:nbn:de:bsz:mh39-74879 |
URL: | http://www.lrec-conf.org/proceedings/lrec2018/summaries/422.html |
ISBN: | 979-10-95546-00-9 |
Parent Title (English): | Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), 7-12 May 2018, Miyazaki, Japan |
Publisher: | European language resources association (ELRA) |
Place of publication: | Paris, France |
Editor: | Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2018 |
Date of Publication (online): | 2018/05/24 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | TEI; TEI LingSIG; lightweight annotation; linguistic annotation |
GND Keyword: | Annotation; Text Encoding Initiative |
First Page: | 1795 |
Last Page: | 1802 |
DDC classes: | 400 Sprache / 430 Deutsch |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Program areas: | Digitale Sprachwissenschaft |
Licence (English): | ![]() |