Detecting annotation noise in automatically labelled data
- We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.
| Author: | Ines Rehbein, Josef Ruppenhofer |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-80343 |
| URL: | http://aclweb.org/anthology/P17-1107 |
| DOI: | https://doi.org/10.18653/v1/P17-1107 |
| ISBN: | 978-1-945626-75-3 |
| Parent Title (English): | Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), vol. 1 (Long Papers). July 30 - August 4, 2017 Vancouver, Canada |
| Publisher: | The Association for Computational Linguistics |
| Place of publication: | Stroudsburg PA, USA |
| Document Type: | Part of a Book |
| Language: | English |
| Year of first Publication: | 2017 |
| Date of Publication (online): | 2018/10/04 |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| GND Keyword: | Annotation; Automatische Sprachverarbeitung; Computerlinguistik; Fehleranalyse |
| First Page: | 1160 |
| Last Page: | 1170 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Leibniz-Classification: | Sprache, Linguistik |
| Linguistics-Classification: | Computerlinguistik |
| Program areas: | Pragmatik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (German): | Urheberrechtlich geschützt |


