Refine
Year of publication
Document Type
- Conference Proceeding (68)
- Part of a Book (19)
- Article (13)
- Book (5)
- Working Paper (2)
- Doctoral Thesis (1)
- Part of Periodical (1)
Keywords
- Automatische Sprachanalyse (30)
- Computerlinguistik (22)
- Deutsch (22)
- Korpus <Linguistik> (20)
- Annotation (15)
- Frame-Semantik (15)
- Semantische Analyse (14)
- Natürliche Sprache (12)
- Beleidigung (11)
- Propositionale Einstellung (10)
Publicationstate
- Veröffentlichungsversion (83)
- Zweitveröffentlichung (9)
- Postprint (5)
Reviewstate
Publisher
- Association for Computational Linguistics (13)
- European Language Resources Association (10)
- The Association for Computational Linguistics (9)
- Universitätsverlag Hildesheim (7)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (4)
- Springer (4)
- Universität Hildesheim (4)
- European Language Resources Association (ELRA) (3)
- European language resources association (ELRA) (3)
- Oxford University Press (3)
The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; Kohl et al. 1998). In this paper, we compare the Levin classification with the work of the FrameNet project (Fillmore & Baker 2001), where words (not just verbs) are grouped according to the conceptual structures (frames) that underlie them and their combinatorial patterns are inductively derived from corpus evidence. This means that verbs grouped together in FrameNet (FN) might be semantically similar but have different (or no) alternations, and that verbs which share the same alternation might be represented in two different semantic frames.
Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.
In this paper, we report on an effort to develop a gold standard for the intensity ordering of subjective adjectives. Rather than pursue a complete order as produced by paying attention to the mean scores of human ratings only, we take into account to what extent assessors consistently rate pairs of adjectives relative to each other. We show that different available automatic methods for producing polar intensity scores produce results that correlate well with our gold standard, and discuss some conceptual questions surrounding the notion of polar intensity.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.
Lexical-semantic theories often suffer from the imprecision of the concepts they employ in their representations. This leads to a considerable decrease in empirical strength by inviting circular argumentation. A demonstration of how to go about overcoming such shortcomings will be carried out, using the lexical semantic concept of "punctuality" as an example. Firstly, I will argue that the distinction between punctuality and durativity plays a crucial role for the explanation of a wide range of syntactic and semantic phenomena. Secondly, I will discuss methodological issues involved in arriving at a more precise definition of punctuality and, finally, the notion of "punctuality" will be given an interpretation on the basis of extensive consultation of research on cognitive time concepts.
Auf dem Weg zu einer Kartographie: automatische und manuelle Analysen am Beispiel des Korpus ISW
(2021)
Recent work suggests that concreteness and imageability play an important role in the meanings of figurative expressions. We investigate this idea in several ways. First, we try to define more precisely the context within which a figurative expression may occur, by parsing a corpus annotated for metaphor. Next, we add both concreteness and imageability as “features” to the parsed metaphor corpus, by marking up words in this corpus using a psycholinguistic database of scores for concreteness and imageability. Finally, we carry out detailed statistical analyses of the augmented version of the original metaphor corpus, cross-matching the features of concreteness and imageability with others in the corpus such as parts of speech and dependency relations, in order to investigate in detail the use of such features in predicting whether a given expression is metaphorical or not.
This paper addresses the task of finding antecedents for locally uninstantiated arguments. To resolve such null instantiations, we develop a weakly supervised approach that investigates and combines a number of linguistically motivated strategies that are inspired by work on semantic role labeling and corefence resolution. The performance of the system is competitive with the current state-of-the-art supervised system.
We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.