NEALT Proceedings Series
Northern European Association for Language Technology
Refine
Document Type
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3) (remove)
Keywords
- Computerlinguistik (2)
- Maschinelles Lernen (2)
- Natürliche Sprache (2)
- Sentimentanalyse (2)
- Text Mining (2)
- Annotation (1)
- Korpus <Linguistik> (1)
- Polarität (1)
- Subjektivität (1)
- Syntaktische Analyse (1)
Publicationstate
Reviewstate
- Peer-Review (2)
Publisher
- Northern European Association for Language Technology (3) (remove)
11
In this paper, we explore different linguistic structures encoded as convolution kernels for the detection of subjective expressions. The advantage of convolution kernels is that complex structures can be directly provided to a classifier without deriving explicit features. The feature design for the detection of subjective expressions is fairly difficult and there currently exists no commonly accepted feature set. We consider various structures, such as constituency parse structures, dependency parse structures, and predicate-argument structures. In order to generalize from lexical information, we additionally augment these structures with clustering information and the task-specific knowledge of subjective words. The convolution kernels will be compared with a standard vector kernel.
4
In opinion mining, there has been only very little work investigating semi-supervised machine learning on document-level polarity classification. We show that semi-supervised learning performs significantly better than supervised learning when only few labelled data are available. Semi-supervised polarity classifiers rely on a predictive feature set. (Semi-)Manually built polarity lexicons are one option but they are expensive to obtain and do not necessarily work in an unknown domain. We show that extracting frequently occurring adjectives & adverbs of an unlabeled set of in-domain documents is an inexpensive alternative which works equally well throughout different domains.
1
This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.