A new attribute class for annotating syntactic dependency relations
- Language corpora, used as data for linguistic research and machine learning, are traditionally annotated with lemmas, part-of-speech tags, and, possibly, morphosyntactic categories. As a lightweight, inline, representation of such (mostly automatic) annotations, the TEI Guidelines provide the att.linguistic class with @lemma, @lemmaRef, @pos, @msd, and @join attributes (cf. Bańskiet al., 2018). Increasingly, corpora are also annotated with syntactic parses in terms of dependency relations between tokens, as a basis for syntactic queries or inferences. A de facto standard for that, supported by a wide array of tools, is the Universal Dependencies framework (UD; cf. Marneffe et al., 2021) with its CoNLL-U format. The present contribution outlines a potential extension of the TEI Guidelines for annotations of syntactic dependency relations, in the form of a new attribute class called att.linguistic.dependency, which extends att.linguistic with the attributes @head and @deprel and several conventions.
| Author: | Piotr BańskiORCiDGND, Andreas NoldaORCiDGND, Harald LüngenORCiDGND |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-135297 |
| DOI: | https://doi.org/10.5281/zenodo.17312233 |
| ISBN: | 978-83-977695-0-2 |
| Parent Title (English): | "New Territories". Text Encoding Initiative Conference and Members' Meeting 2025. September 16–20, 2025. Kraków, Poland. Book of Abstracts |
| Publisher: | Zenodo |
| Place of publication: | Genf |
| Editor: | Joanna HałaczkiewiczORCiDGND |
| Document Type: | Part of a Book |
| Language: | English |
| Year of first Publication: | 2025 |
| Date of Publication (online): | 2025/10/15 |
| Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| Tag: | Dependencies; Dependency annotation; Grammatical annotation; Lightweight; Universal |
| GND Keyword: | Annotation; Computerlinguistik; Grammatik; Korpus <Linguistik> |
| First Page: | 15 |
| Last Page: | 19 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Linguistics-Classification: | Computerlinguistik |
| Program areas: | Grammatik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (English): | Creative Commons - Attribution 4.0 International |


