Volltext-Downloads (blau) und Frontdoor-Views (grau)

A new attribute class for annotating syntactic dependency relations

  • Language corpora, used as data for linguistic research and machine learning, are traditionally annotated with lemmas, part-of-speech tags, and, possibly, morphosyntactic categories. As a lightweight, inline, representation of such (mostly automatic) annotations, the TEI Guidelines provide the att.linguistic class with @lemma, @lemmaRef, @pos, @msd, and @join attributes (cf. Bańskiet al., 2018). Increasingly, corpora are also annotated with syntactic parses in terms of dependency relations between tokens, as a basis for syntactic queries or inferences. A de facto standard for that, supported by a wide array of tools, is the Universal Dependencies framework (UD; cf. Marneffe et al., 2021) with its CoNLL-U format. The present contribution outlines a potential extension of the TEI Guidelines for annotations of syntactic dependency relations, in the form of a new attribute class called att.linguistic.dependency, which extends att.linguistic with the attributes @head and @deprel and several conventions.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Piotr BańskiORCiDGND, Andreas NoldaORCiDGND, Harald LüngenORCiDGND
URN:urn:nbn:de:bsz:mh39-135297
DOI:https://doi.org/10.5281/zenodo.17312233
ISBN:978-83-977695-0-2
Parent Title (English):"New Territories". Text Encoding Initiative Conference and Members' Meeting 2025. September 16–20, 2025. Kraków, Poland. Book of Abstracts
Publisher:Zenodo
Place of publication:Genf
Editor:Joanna HałaczkiewiczORCiDGND
Document Type:Part of a Book
Language:English
Year of first Publication:2025
Date of Publication (online):2025/10/15
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Dependencies; Dependency annotation; Grammatical annotation; Lightweight; Universal
GND Keyword:Annotation; Computerlinguistik; Grammatik; Korpus <Linguistik>
First Page:15
Last Page:19
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Program areas:Grammatik
Program areas:Digitale Sprachwissenschaft
Licence (English):License LogoCreative Commons - Attribution 4.0 International