Refine
Year of publication
Document Type
- Conference Proceeding (11)
- Part of a Book (6)
- Article (2)
- Book (1)
Keywords
- Deutsch (20) (remove)
Publicationstate
- Veröffentlichungsversion (16)
- Zweitveröffentlichung (2)
- Postprint (1)
Reviewstate
- Peer-Review (16)
- (Verlags)-Lektorat (2)
- Peer-review (1)
Publisher
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (4)
- European Language Resources Association (2)
- The Association for Computational Linguistics (2)
- Austrian Academy of Sciences (1)
- Austrian academy of sciences (1)
- CSLI Publications (1)
- Eigenverlag ÖGAI (1)
- European Language Resources Association (ELRA) (1)
- European language resources association (ELRA) (1)
- Gesellschaft für Sprachtechnologie and Computerlinguistik (1)
We present a descriptive analysis on the two datasets from the shared task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS), the only existing German dataset for opinion role extraction of its size. Our analysis discusses the individual properties of the three components, subjective expressions, sources and targets and their relations towards each other. Our observations should help practitioners and researchers when building a system to extract opinion roles from German data.
We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.
We present a new resource for German causal language, with annotations in context for verbs, nouns and adpositions. Our dataset includes 4,390 annotated instances for more than 150 different triggers. The annotation scheme distinguishes three different types of causal events (CONSEQUENCE, MOTIVATION, PURPOSE). We also provide annotations for semantic roles, i.e. of the cause and effect for the causal event as well as the actor and affected party, if present. In the paper, we present inter-annotator agreement scores for our dataset and discuss problems for annotating causal language. Finally, we present experiments where we frame causal annotation as a sequence labelling problem and report baseline results for the prediciton of causal arguments and for predicting different types of causation.
A Supervised learning approach for the extraction of opinion sources and targets from German text
(2019)
We present the first systematic supervised learning approach for the extraction of opinion sources and targets on German language data. A wide choice of different features is presented, particularly syntactic features and generalization features. We point out specific differences between opinion sources and targets. Moreover, we explain why implicit sources can be extracted even with fairly generic features. In order to ensure comparability our classifier is trained and tested on the dataset of the STEPS shared task.
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
Auf dem Weg zu einer Kartographie: automatische und manuelle Analysen am Beispiel des Korpus ISW
(2021)
Alternations play a central role in most current theories of verbal argument structure, wich are devides primarily to model the syntactic flexibility of verbs. Accordingly, these frameworks take verbs, and their projection properties, to be the sole contributors of thematic content to the clause. Approached from this perspective, the German applicative (or be-prefix) construction has puzzling properties. First, while many applicative verbs have transparent base forms, many, including those coined from nouns, do not. Second, applicative verbs are bound by interpretive and argument-realization conditions which cannot be traced to their base forms, if any. These facts suggest that applicative formation is not appropriately modeled as a lexical rule.
Using corpus data from a diverse array of genres, Michaelis and Ruppenhofer propose a unified solution to these two puzzles within the framework of Construction Grammar. Central to this account is the concept of valence augmentation: argument-structure constructions denote event types, and therefore license valence sets which may properly include those of their lexical fillers. As per Panini's Law, resolution of valence mismatch favors the construction over the verb. Like verbs of transfer and location, the applicative construction has a prototype-based event-structure representation: diverse implications of applicative predications--including iteration, transfer, affectedness, intensity and saturation--are shown to derive via regular patterns of semantic extension from the topological concept of coverage.
German is a language with complex morphological processes. Its long and often ambiguous word forms present a bottleneck problem in natural language processing. As a step towards morphological analyses of high quality, this paper introduces a morphological treebank for German. It is derived from the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished, modernized and partially revised version. The derivation of the morphological trees is not trivial, especially for such cases of conversions which are morpho-semantically opaque and merely of diachronic interest. We develop solutions and present exemplary analyses. The resulting database comprises about 40,000 morphological trees of a German base vocabulary whose format and grade of detail can be chosen according to the requirements of the applications. The Perl scripts for the generation of the treebank are publicly available on github. In our discussion, we show some future directions for morphological treebanks. In particular, we aim at the combination with other reliable lexical resources such as GermaNet.