Refine
Document Type
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
Publicationstate
Reviewstate
- Peer-Review (3)
Publisher
- European Language Resources Association (3) (remove)
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
In recent years, text classification in sentiment analysis has mostly focused on two types of classification, the distinction between objective and subjective text, i.e. subjectivity detection, and the distinction between positive and negative subjective text, i.e. polarity classification. So far, there has been little work examining the distinction between definite polar subjectivity and indefinite polar subjectivity. While the former are utterances which can be categorized as either positive or negative, the latter cannot be categorized as either of these two categories. This paper presents a small set of domain independent features to detect indefinite polar sentences. The features reflect the linguistic structure underlying these types of utterances. We give evidence for the effectiveness of these features by incorporating them into an unsupervised rule-based classifier for sentence-level analysis and compare its performance with supervised machine learning classifiers, i.e. Support Vector Machines (SVMs) and Nearest Neighbor Classifier (kNN). The data used for the experiments are web-reviews collected from three different domains.
We present a gold standard for semantic relation extraction in the food domain for German. The relation types that we address are motivated by scenarios for which IT applications present a commercial potential, such as virtual customer advice in which a virtual agent assists a customer in a supermarket in finding those products that satisfy their needs best. Moreover, we focus on those relation types that can be extracted from natural language text corpora, ideally content from the internet, such as web forums, that are easy to retrieve. A typical relation type that meets these requirements are pairs of food items that are usually consumed together. Such a relation type could be used by a virtual agent to suggest additional products available in a shop that would potentially complement the items a customer has already in their shopping cart. Our gold standard comprises structural data, i.e. relation tables, which encode relation instances. These tables are vital in order to evaluate natural language processing systems that extract those relations.