POS tagset refinement for linguistic analysis and the impact on statistical parsing
- The annotation of parts of speech (POS) in linguistically annotated corpora is a fundamental annotation layer which provides the basis for further syntactic analyses, and many NLP tools rely on POS information as input. However, most POS annotation schemes have been developed with written (newspaper) text in mind and thus do not carry over well to text from other domains and genres. Recent discussions have concentrated on the shortcomings of present POS annotation schemes with regard to their applicability to data from domains other than newspaper text.
Author: | Ines Rehbein, Hagen Hirschmann |
---|---|
URN: | urn:nbn:de:bsz:mh39-80368 |
URL: | http://tlt13.sfs.uni-tuebingen.de/tlt13-proceedings.pdf |
Handle: | http://hdl.handle.net/11022/0000-0000-2D81-C@ds1 |
ISBN: | 978-3-9809183-9-8 |
Parent Title (English): | Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13). December 12-13, 2014, Tübingen, Germany |
Publisher: | University of Tübingen |
Place of publication: | Tübingen |
Editor: | Verena Henrich, Erhard Hinrichs, Daniël de Kok, Petya Osenova, Adam Przepiórkowski |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2014 |
Date of Publication (online): | 2018/10/04 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
GND Keyword: | Annotation; Korpus <Linguistik>; Parts of speech; Syntaktische Analyse |
First Page: | 172 |
Last Page: | 183 |
DDC classes: | 400 Sprache / 430 Deutsch |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |