Refine
Year of publication
Document Type
- Conference Proceeding (81) (remove)
Keywords
- Computerlinguistik (23)
- Natürliche Sprache (16)
- Korpus <Linguistik> (15)
- Maschinelles Lernen (12)
- Deutsch (11)
- Digital Humanities (8)
- Information Extraction (8)
- Annotation (6)
- Automatische Spracherkennung (6)
- Gesprochene Sprache (6)
Publicationstate
- Zweitveröffentlichung (81) (remove)
Reviewstate
- Peer-Review (49)
- (Verlags)-Lektorat (22)
- Peer review (1)
- Review-Status-unbekannt (1)
Publisher
With the advent of mobile devices, mediatized political discourse became more dynamic. I assume that the microblog Twitter can be considered as a medium for spatial coordination during protests. Therefore, the case of neo-Nazi demonstrations and counter-protests in the city of Dresden that occurred in February 2012 is analysed. Data consists of microposts that occurred during the event. Quantitative analysis of hashtag and retweet frequencies was performed as well as qualitative speech act pattern analysis and a tempo-spatial discourse analysis on selected subsets of microposts. Results show that a common linguistic practice is verbal georeferencing and by that constructing space. Empirical analysis indicates a strong relation between communicational online space and physical offline place: Protest participants permanently reconfigure spatial context discursively and thus the contested protest area becomes a temporarily meaningful place.
A "polyglottal" speech synthesis - modifications for a replica of Kempelen's speaking machine
(2019)
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
We present a new resource for German causal language, with annotations in context for verbs, nouns and adpositions. Our dataset includes 4,390 annotated instances for more than 150 different triggers. The annotation scheme distinguishes three different types of causal events (CONSEQUENCE, MOTIVATION, PURPOSE). We also provide annotations for semantic roles, i.e. of the cause and effect for the causal event as well as the actor and affected party, if present. In the paper, we present inter-annotator agreement scores for our dataset and discuss problems for annotating causal language. Finally, we present experiments where we frame causal annotation as a sequence labelling problem and report baseline results for the prediciton of causal arguments and for predicting different types of causation.
Accentuation, Uncertainty and Exhaustivity - Towards a Model of Pragmatic Focus Interpretation
(2010)
This paper presents a model of pragmatic focus interpretation that is assumed to be part of a complete language comprehension model and that is inspired by Levelt's language processing model. The model is derived from our empirical data on the role of accentuation, prosodic indicators of uncertainty and context for pragmatic focus interpretation. In its present state, the model is restricted to these data, but nevertheless generates predictions.
This paper describes a rule-based approach to detect direct speech without the help of any quotation markers. As datasets fictional and non-fictional texts were used. Our evaluation shows that the results appear stable throughout different datasets in the fictional domain and are comparable to the results achieved in related work.
Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society
(2020)
This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.
Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations
(2009)
In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.