Cost-Sensitive Learning in Answer Extraction
- One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
Author: | Michael WiegandGND, Jochen L. Leidner, Dietrich Klakow |
---|---|
URN: | urn:nbn:de:bsz:mh39-85373 |
URL: | https://aclanthology.info/papers/L08-1293/l08-1293 |
ISBN: | 2-9517408-4-0 |
Parent Title (English): | Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), May 28-30, 2008, Marrakech, Morocco |
Publisher: | European Language Resources Association |
Place of publication: | Paris |
Editor: | Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2008 |
Date of Publication (online): | 2019/02/28 |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | Peer-Review |
Tag: | Acquisition; Machine Learning; Question Answering; Statistical methods |
GND Keyword: | Computerlinguistik; Information Extraction; Maschinelles Lernen; Natürliche Sprache |
First Page: | 711 |
Last Page: | 714 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |