Detecting annotation noise in automatically labelled data

We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.

Metadaten
Author:	Ines Rehbein, Josef Ruppenhofer
URN:	urn:nbn:de:bsz:mh39-80343
URL:	http://aclweb.org/anthology/P17-1107
DOI:	https://doi.org/10.18653/v1/P17-1107
ISBN:	978-1-945626-75-3
Parent Title (English):	Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), vol. 1 (Long Papers). July 30 - August 4, 2017 Vancouver, Canada
Publisher:	The Association for Computational Linguistics
Place of publication:	Stroudsburg PA, USA
Document Type:	Part of a Book
Language:	English
Year of first Publication:	2017
Date of Publication (online):	2018/10/04
Publicationstate:	Veröffentlichungsversion
Reviewstate:	Peer-Review
GND Keyword:	Annotation; Automatische Sprachverarbeitung; Computerlinguistik; Fehleranalyse
First Page:	1160
Last Page:	1170
DDC classes:	400 Sprache / 400 Sprache, Linguistik
Open Access?:	ja
Leibniz-Classification:	Sprache, Linguistik
Linguistics-Classification:	Computerlinguistik
Program areas:	Pragmatik
Program areas:	Digitale Sprachwissenschaft
Licence (German):	Urheberrechtlich geschützt

Open Access