OPUS 4 | Search

Refine

Has Fulltext

yes (4)

4 search hits

1 to 4

Sort by

POS tagset refinement for linguistic analysis and the impact on statistical parsing (2014)

The annotation of parts of speech (POS) in linguistically annotated corpora is a fundamental annotation layer which provides the basis for further syntactic analyses, and many NLP tools rely on POS information as input. However, most POS annotation schemes have been developed with written (newspaper) text in mind and thus do not carry over well to text from other domains and genres. Recent discussions have concentrated on the shortcomings of present POS annotation schemes with regard to their applicability to data from domains other than newspaper text.

POS error detection in automatically annotated corpora (2014)

Rehbein, Ines

Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can not be used for automatically annotated corpora where errors are systematic and cannot easily be identified by looking at the variance in the data. This paper targets the detection of POS errors in automatically annotated corpora, so-called silver standards, showing that by combining different measures sensitive to annotation quality we can identify a large part of the errors and obtain a substantial increase in accuracy.

From <tiger2/> to ISOTiger – community driven developments for syntax annotation in SynAF (2014)

Bosch, Sonja ; Eckart, Kerstin ; Faaß, Gertrud ; Heid, Ulrich ; Lee, Kiyong ; Pareja-Lora, Antonio ; Pretorius, Laurette ; Romary, Laurent ; Witt, Andreas ; Zeldes, Amir ; Zipser, Florian

In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.

An XML Annotation Schema for speech, thought and writing representation (2014)

Brunner, Annelen

This contribution presents an XML Schema for annotating a high level narratological category: speech, thought and writing representation (ST&WR). It focusses on two aspects: Firstly, the original Schema is presented as an example for the challenge to encode a narrative feature in a structured and flexible way and secondly, ways of adapting this Schema to TEI are considered, in Order to make it usable for other, TEI-based projects.

1 to 4

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

4 search hits