Evaluating the Impact of Coder Errors on Active Learning
- Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard data, there is some doubt whether the approach can be successful when applied to noisy, real-world data sets. This paper presents a thorough evaluation of the impact of annotation noise on AL and shows that systematic noise resulting from biased coder decisions can seriously harm the AL process. We present a method to filter out inconsistent annotations during AL and show that this makes AL far more robust when applied to noisy data.
Author: | Ines Rehbein, Josef RuppenhoferGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-52929 |
URL: | http://dl.acm.org/citation.cfm?id=2002479&CFID=841147757&CFTOKEN=19861493 |
ISBN: | 978-1-932432-87-9 |
Parent Title (English): | HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies |
Publisher: | Association for Computational Linguistics |
Place of publication: | Stroudsburg |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2011 |
Date of Publication (online): | 2016/09/22 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Active Learning; Machine learning; Natural language processing |
Issue: | 1 |
First Page: | 43 |
Last Page: | 51 |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |