Refine
Year of publication
Document Type
- Conference Proceeding (10)
- Article (2)
- Doctoral Thesis (1)
Keywords
- Korpus <Linguistik> (13) (remove)
Publicationstate
- Veröffentlichungsversion (13) (remove)
Reviewstate
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.
Auf dem Weg zu einer Kartographie: automatische und manuelle Analysen am Beispiel des Korpus ISW
(2021)
Recent work suggests that concreteness and imageability play an important role in the meanings of figurative expressions. We investigate this idea in several ways. First, we try to define more precisely the context within which a figurative expression may occur, by parsing a corpus annotated for metaphor. Next, we add both concreteness and imageability as “features” to the parsed metaphor corpus, by marking up words in this corpus using a psycholinguistic database of scores for concreteness and imageability. Finally, we carry out detailed statistical analyses of the augmented version of the original metaphor corpus, cross-matching the features of concreteness and imageability with others in the corpus such as parts of speech and dependency relations, in order to investigate in detail the use of such features in predicting whether a given expression is metaphorical or not.
Catching the common cause: extraction and annotation of causal relations and their participants
(2017)
In this paper, we present a simple, yet effective method for the automatic identification and extraction of causal relations from text, based on a large English-German parallel corpus. The goal of this effort is to create a lexical resource for German causal relations. The resource will consist of a lexicon that describes constructions that trigger causality as well as the participants of the causal event, and will be augmented by a corpus with annotated instances for each entry, that can be used as training data to develop a system for automatic classification of causal relations. Focusing on verbs, our method harvested a set of 100 different lexical triggers of causality, including support verb constructions. At the moment, our corpus includes over 1,000 annotated instances. The lexicon and the annotated data will be made available to the research community.
Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates
(2021)
This paper investigates the use of first person plural pronouns as a rhetorical device in political speeches. We present an annotation schema for disambiguating pronoun references and use our schema to create an annotated corpus of debates from the German Bundestag. We then use our corpus to learn to automatically resolve pronoun referents in parliamentary debates. We explore the use of data augmentation with weak supervision to further expand our corpus and report preliminary results.
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.
This dissertation investigates discourse-pragmatic differences between variably linked arguments appearing in alternating argument structure constructions in the sense of Goldberg (1995) and Kay (manuscript). The properties that are studied include givenness, pragmatic relation (topic/focus), salience of referents, animacy, and others. They derive from the literature on sentence-type constructions such as topicalization and from research on the referential properties of NP form types.
The research carried out here has multiple uses. At the most basic level, it serves as an empirical check on existing characterizations of the pragmatic properties of the relevant arguments that are the result of syntactic and semantic analysis based on introspection alone. For instance, for the epistemic raising alternation involving verbs like seem, the predicted topicality difference between the subjects of the raised and unraised constructions (Langacker 1995) could not be confirmed.
This dissertation also addresses the question what kinds of pragmatic factors, if any, are relevant to argument structure constructions. Based on the evidence of the dative alternation, it does not seem to be the case that the kind of pragmatic influences on argument structure constructions are different or limited compared to the ones found to be relevant to sentence-type constructions.
The kind of research undertaken here can also inform the syntactic and semantic analysis of constructions. In the case of the dative alternation, the discourse-pragmatic characteristics of the variably linked arguments provide evidence that Basilico’s (1998) analysis of the difference between the alternates in terms of VP-shells and a difference between thetic and categorical ‘inner’ predication, on the one hand does not account for all the data and on the other can be re-stated in pragmatic terms other than the thetic-categorical distinction.
In addition to studies of valence alternations, this dissertation also discusses various null instantiation phenomena, which provide further evidence for the need to specify discourse-pragmatic properties as part of argument structure constructions and lexical entries.
Finally, it is suggested that the use of randomly sampled corpus data and statistical modelling throughout this dissertation improves both empirical and analytical coverage.
Accurate opinion mining requires the exact identification of the source and target of an opinion. To evaluate diverse tools, the research community relies on the existence of a gold standard corpus covering this need. Since such a corpus is currently not available for German, the Interest Group on German Sentiment Analysis decided to create such a resource and make it available to the research community in the context of a shared task. In this paper, we describe the selection of textual sources, development of annotation guidelines, and first evaluation results in the creation of a gold standard corpus for the German language.
We compare several different corpus- based and lexicon-based methods for the scalar ordering of adjectives. Among them, we examine for the first time a low- resource approach based on distinctive- collexeme analysis that just requires a small predefined set of adverbial modifiers. While previous work on adjective intensity mostly assumes one single scale for all adjectives, we group adjectives into different scales which is more faithful to human perception. We also apply the methods to both polar and non-polar adjectives, showing that not all methods are equally suitable for both types of adjectives.
Opinion Holder and Target Extraction for Verb-based Opinion Predicates – The Problem is Not Solved
(2015)
We offer a critical review of the current state of opinion role extraction involving opinion verbs. We argue that neither the currently available lexical resources nor the manually annotated text corpora are sufficient to appropriately study this task. We introduce a new corpus focusing on opinion roles of opinion verbs from the Subjectivity Lexicon and show potential benefits of this corpus. We also demonstrate that state-of-the-art classifiers perform rather poorly on this new dataset compared to the standard dataset for the task showing that there still remains significant research to be done.