410 Linguistik
Refine
Year of publication
- 2012 (42) (remove)
Document Type
- Part of a Book (20)
- Article (13)
- Conference Proceeding (9)
Has Fulltext
- yes (42)
Is part of the Bibliography
- no (42)
Keywords
- Deutsch (15)
- Korpus <Linguistik> (10)
- Konversationsanalyse (6)
- Eheschließung (4)
- Englisch (4)
- Annotation (3)
- Biografisches Interview (3)
- Standardisierung (3)
- Arzt (2)
- Computerlinguistik (2)
Publicationstate
- Veröffentlichungsversion (33)
- Postprint (1)
Reviewstate
- (Verlags)-Lektorat (27)
- Peer-Review (7)
- Peer-review (1)
- Review-Status-unbekannt (1)
Publisher
- de Gruyter (8)
- Narr (6)
- European Language Resources Association (ELRA) (5)
- De Gruyter (2)
- ACM (1)
- Akademie Verlag (1)
- Association for Computational Linguistics (1)
- Benjamins (1)
- Campus (1)
- Eigenverlag ÖGAI (1)
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep analysis depends on a) enriching shallow representations with linguistically motivated, rich information, and b) focussing different branches of research and combining ressources to create synergies with related work in NLP. In the paper, we propose SentiFrameNet, an extension to FrameNet, as a novel representation for sentiment analysis that is tailored to these aims.
We report an ethnographic and field-experiment-based study of time intervals in Amondawa, a Tupi language and culture of Amazonia. We analyse two Amondawa time interval systems based on natural environmental events (seasons and days), as well as the Amondawa system for categorising lifespan time (“age”). Amondawa time intervals are exclusively event-based, as opposed to time-based (i.e. they are based on event-duration, rather than measured abstract time units). Amondawa has no lexicalised abstract concept of time and no practices of time reckoning, as conventionally understood in the anthropological literature. Our findings indicate that not only are time interval systems and categories linguistically and culturally specific, but that they do not depend upon a universal “concept of time”. We conclude that the abstract conceptual domain of time is not a human cognitive universal, but a cultural historical construction, semiotically mediated by symbolic and cultural-cognitive artefacts for time reckoning.
This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.
Introduction
(2012)