Refine
Year of publication
- 2017 (2) (remove)
Document Type
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- yes (2)
Keywords
- Annotation (1)
- Gesprochene Sprache (1)
- Korpus <Linguistik> (1)
- Parser (1)
- Standardisierung (1)
- Syntax (1)
- Transkription (1)
- Universalgrammatik (1)
Publicationstate
- Veröffentlichungsversion (2) (remove)
Reviewstate
- Peer-Review (2)
Publisher
- Linköping University Electronic Press (2) (remove)
Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.
We present an approach to making existing CLARIN web services usable for spoken language transcriptions. Our approach is based on a new TEI-based ISO standard for such transcriptions. We show how existing tool formats can be transformed to this standard, how an encoder/decoder pair for the TCF format enables users to feed this type of data through a WebLicht tool chain, and why and how web services operating directly on the standard format would be useful.