Refine
Year of publication
Document Type
- Conference Proceeding (9)
- Part of a Book (1)
- Doctoral Thesis (1)
- Master's Thesis (1)
Language
- English (12)
Has Fulltext
- yes (12)
Keywords
- Computerlinguistik (10)
- Natürliche Sprache (10)
- Information Extraction (9)
- Maschinelles Lernen (8)
- Sentimentanalyse (4)
- Text Mining (4)
- Name (2)
- Polarität (2)
- Acquisition (1)
- Akademischer Grad (1)
Publicationstate
- Zweitveröffentlichung (12) (remove)
Reviewstate
Publisher
- European Language Resources Association (4)
- Asian Federation of Natural Language Processing (1)
- Association for Computational Linguistics (1)
- Fundacja Uniwersytetu im. Adama Mickiewicza (1)
- Incoma Ltd. (1)
- Institut für Informationswissenschaft und Sprachtechnologie, Universität Hildesheim (1)
- Universitätsverlag Hildesheim (1)
The sentiment polarity of an expression (whether it is perceived as positive, negative or neutral) can be influenced by a number of phenomena, foremost among them negation. Apart from closed-class negation words like no, not or without, negation can also be caused by so-called polarity shifters. These are content words, such as verbs, nouns or adjectives, that shift polarities in their opposite direction, e. g. abandoned in “abandoned hope” or alleviate in “alleviate pain”. Many polarity shifters can affect both positive and negative polar expressions, shifting them towards the opposing polarity. However, other shifters are restricted to a single shifting direction. Recoup shifts negative to positive in “recoup your losses”, but does not affect the positive polarity of fortune in “recoup a fortune”. Existing polarity shifter lexica only specify whether a word can, in general, cause shifting, but they do not specify when this is limited to one shifting direction. To address this issue we introduce a supervised classifier that determines the shifting direction of shifters. This classifier uses both resource-driven features, such as WordNet relations, and data-driven features like in-context polarity conflicts. Using this classifier we enhance the largest available polarity shifter lexicon.
Entity framing is the selection of aspects of an entity to promote a particular viewpoint towards that entity. We investigate entity framing of political figures through the use of names and titles in German online discourse, enhancing current research in entity framing through titling and naming that concentrates on English only. We collect tweets that mention prominent German politicians and annotate them for stance. We find that the formality of naming in these tweets correlates positively with their stance. This confirms sociolinguistic observations that naming and titling can have a status-indicating function and suggests that this function is dominant in German tweets mentioning political figures. We also find that this status-indicating function is much weaker in tweets from users that are politically left-leaning than in tweets by right leaning users. This is in line with observations from moral psychology that left-leaning and right-leaning users assign different importance to maintaining social hierarchies.
Sentiment Analysis is the task of extracting and classifying opinionated content in natural language texts. Common subtasks are the distinction between opinionated and factual texts, the classification of polarity in opinionated texts, and the extraction of the participating entities of an opinion(-event), i.e. the source from which an opinion emanates and the target towards which it is directed. With the emerging Web 2.0 which describes the shift towards a highly user-interactive communication medium, the amount of subjective content on the World Wide Web is steadily increasing. Thus, there is a growing need for automatically processing this type of content which is provided by sentiment analysis. Both natural language processing, which is the task of providing computational methods for the analysis and representation of natural language, and machine learning, which is the task of building task-specific classification models on the basis of empirical data, may be instrumental in mastering the challenges of the automatic sentiment analysis of written text. Many problems in sentiment analysis have been proposed to be solved with machine learning methods exclusively using a fairly low-level feature design, such as bag of words, containing little linguistic information. In this thesis, we examine the effectiveness of linguistic features in various subtasks of sentiment analysis. Thus, we heavily draw from the insights gained by natural language processing. The application of linguistic features can be applied on various classification methods, be it in rule-based classification, where the linguistic features are directly encoded as a classifier, in supervised machine learning, where these features complement basic low-level features, or in bootstrapping methods, where these features form a rule-based classifier generating a labeled training set from which a supervised classifier can be trained. In this thesis, we will in particular focus on scenarios where the combination of linguistic features and machine learning methods is effective. We will look at common text classification tasks, both coarse-grained and fine-grained, and extraction tasks.
While good results have been achieved for named entity recognition (NER) in supervised settings, it remains a problem that for low resource languages and less studied domains little or no labelled data is available. As NER is a crucial preprocessing step for many natural language processing tasks, finding a way to overcome this deficit in data remains of great interest. We propose a distant supervision approach to NER that is both language and domain independent where we automatically generate labelled training data using gazetteers that we previously extracted from Wikipedia. We test our approach on English, German and Estonian data sets and contribute further by introducing several successful methods to reduce the noise in the generated training data. The tested models beat baseline systems and our results show that distant supervision can be a promising approach for NER when no labelled data is available. For the English model we also show that the distant supervision model is better at generalizing within the same domain of news texts by comparing it against a supervised model on a different test set.
Opinion holder extraction is one of the most important tasks in sentiment analysis. We will briefly outline the importance of predicates for this task and categorize them according to part of speech and according to which semantic role they select for the opinion holder. For many languages there do not exist semantic resources from which such predicates can be easily extracted. Therefore, we present alternative corpus-based methods to gain such predicates automatically, including the usage of prototypical opinion holders, i.e. common nouns, denoting for example experts or analysts, which describe particular groups of people whose profession or occupation is to form and express opinions towards specific items.
In der natürlichen Sprachverarbeitung haben Frage-Antwort-Systeme in der letzten Dekade stark an Bedeutung gewonnen. Vor allem durch robuste Werkzeuge wie statistische Syntax-Parser und Eigennamenerkenner ist es möglich geworden, linguistisch strukturierte Informationen aus unannotierten Textkorpora zu gewinnen. Zusätzlich werden durch die Text REtrieval Conference (TREC) jährlich Maßstäbe für allgemeine domänen-unabhängige Frage-Antwort-Szenarien definiert. In der Regel funktionieren Frage-Antwort-Systeme nur gut, wenn sie robuste Verfahren für die unterschiedlichen Fragetypen, die in einer Fragemenge vorkommen, implementieren. Ein charakteristischer Fragetyp sind die sogenannten Ereignisfragen. Obwohl Ereignisse schon seit Mitte des vorigen Jahrhunderts in der theoretischen Linguistik, vor allem in der Satzsemantik, Gegenstand intensive Forschung sind, so blieben sie bislang im Bezug auf Frage-Antwort-Systeme weitgehend unerforscht. Deshalb widmet sich diese Diplomarbeit diesem Problem. Ziel dieser Arbeit ist zum Einen eine Charakterisierung von Ereignisstruktur in Frage-Antwort Systemen, die unter Berücksichtigung der theoretischen Linguistik sowie einer Analyse der TREC 2005 Fragemenge entstehen soll. Zum Anderen soll ein Ereignis-basiertes Antwort-Extraktionsverfahren entworfen und implementiert werden, das sich auf den Ergebnissen dieser Analyse stützt. Informationen von diversen linguistischen Ebenen sollen daten-getrieben in einem uniformen Modell integriert werden. Spezielle linguistische Ressourcen, wie z.B. WordNet und Subkategorisierungslexika werden dabei eine zentrale Rolle einnehmen. Ferner soll eine Ereignisstruktur vorgestellt werden, die das Abpassen von Ereignissen unabhängig davon, ob sie von Vollverben oder Nominalisierungen evoziert werden, erlaubt. Mit der Implementierung eines Ereignis-basierten Antwort-Extraktionsmoduls soll letztendlich auch die Frage beantwortet werden, ob eine explizite Ereignismodellierung die Performanz eines Frage-Antwort-Systems verbessern kann.
We explore the feasibility of contextual healthiness classification of food items. We present a detailed analysis of the linguistic phenomena that need to be taken into consideration for this task based on a specially annotated corpus extracted from web forum entries. For automatic classification, we compare a supervised classifier and rule-based classification. Beyond linguistically motivated features that include sentiment information we also consider the prior healthiness of food items.
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
In recent years, text classification in sentiment analysis has mostly focused on two types of classification, the distinction between objective and subjective text, i.e. subjectivity detection, and the distinction between positive and negative subjective text, i.e. polarity classification. So far, there has been little work examining the distinction between definite polar subjectivity and indefinite polar subjectivity. While the former are utterances which can be categorized as either positive or negative, the latter cannot be categorized as either of these two categories. This paper presents a small set of domain independent features to detect indefinite polar sentences. The features reflect the linguistic structure underlying these types of utterances. We give evidence for the effectiveness of these features by incorporating them into an unsupervised rule-based classifier for sentence-level analysis and compare its performance with supervised machine learning classifiers, i.e. Support Vector Machines (SVMs) and Nearest Neighbor Classifier (kNN). The data used for the experiments are web-reviews collected from three different domains.
In this paper, we investigate the role of predicates in opinion holder extraction. We will examine the shape of these predicates, investigate what relationship they bear towards opinion holders, determine what resources are potentially useful for acquiring them, and point out limitations of an opinion holder extraction system based on these predicates. For this study, we will carry out an evaluation on a corpus annotated with opinion holders. Our insights are, in particular, important for situations in which no labelled training data are available and only rule-based methods can be applied.
In this paper, we compare three different generalization methods for in-domain and cross-domain opinion holder extraction being simple unsupervised word clustering, an induction method inspired by distant supervision and the usage of lexical resources. The generalization methods are incorporated into diverse classifiers. We show that generalization causes significant improvements and that the impact of improvement depends on the type of classifier and on how much training and test data differ from each other. We also address the less common case of opinion holders being realized in patient position and suggest approaches including a novel (linguistically-informed) extraction method how to detect those opinion holders without labeled training data as standard datasets contain too few instances of this type.
We present the German Sentiment Analysis Shared Task (GESTALT) which consists of two main tasks: Source, Subjective Expression and Target Extraction from Political Speeches (STEPS) and Subjective Phrase and Aspect Extraction from Product Reviews (StAR). Both tasks focused on fine-grained sentiment analysis, extracting aspects and targets with their associated subjective expressions in the German language. STEPS focused on political discussions from a corpus of speeches in the Swiss parliament. StAR fostered the analysis of product reviews as they are available from the website Amazon.de. Each shared task led to one participating submission, providing baselines for future editions of this task and highlighting specific challenges. The shared task homepage can be found at https://sites.google.com/site/iggsasharedtask/.