Refine
Year of publication
- 2013 (2) (remove)
Document Type
- Article (1)
- Part of a Book (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Bootstrapping methods (1)
- Computer-Mediated Communication (1)
- Computerlinguistik (1)
- Corpus Linguistics (1)
- Decision Trees (1)
- Feature engineering (1)
- Genitive Classification (1)
- Grammar (1)
- Machine Leaming (1)
- Maschinelles Lernen (1)
Publicationstate
- Postprint (2) (remove)
Reviewstate
- (Verlags)-Lektorat (1)
- Peer-Review (1)
Publisher
- Springer (2) (remove)
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rule-based classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge by training a supervised classifier with in-domain features, such as bag of words, on instances labeled by a rule-based classifier. Thus, this approach can be considered as a simple and effective method for domain adaptation. Among the list of components of this approach, we investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. In particular, the former addresses the issue in how far linguistic modeling is relevant for this task. We not only examine how this method performs under more difficult settings in which classes are not balanced and mixed reviews are included in the data set but also compare how this linguistically-driven method relates to state-of-the-art statistical domain adaptation.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.