Refine
Document Type
- Article (4) (remove)
Language
- English (4) (remove)
Has Fulltext
- yes (4)
Keywords
- Annotation (2)
- Jugendsprache (2)
- Korpus <Linguistik> (2)
- Angewandte Linguistik (1)
- Annotation guidelines (1)
- Automatische Sprachverarbeitung (1)
- Covariation (1)
- Datenbanksystem (1)
- Directive particles (1)
- Frame semantics (1)
Publicationstate
Reviewstate
- Peer-Review (2)
- (Verlags)-Lektorat (1)
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
This paper investigates evidence for linguistic coherence in new urban dialects that evolved in multiethnic and multilingual urban neighbourhoods. We propose a view of coherence as an interpretation of empirical observations rather than something that would be ‘‘out there in the data’’, and argue that this interpretation should be based on evidence of systematic links between linguistic phenomena, as established by patterns of covariation between phenomena that can be shown to be related at linguistic levels. In a case study, we present results from qualitative and quantitative analyses for a set of phenomena that have been described for Kiezdeutsch, a new dialect from multilingual urban Germany. Qualitative analyses point to linguistic relationships between different phenomena and between pragmatic and linguistic levels. Quantitative analyses, based on corpus data from KiDKo (www.kiezdeutschkorpus.de), point to systematic advantages for the Kiezdeutsch data from a multiethnic and multilingual context provided by the main corpus (KiDKo/Mu), compared to complementary corpus data from a mostly monoethnic and monolingual (German) context (KiDKo/Mo). Taken together, this indicates patterns of covariation that support an interpretation of coherence for this new dialect: our findings point to an interconnected linguistic system, rather than to a mere accumulation of individual features. In addition to this internal coherence, the data also points to external coherence: Kiezdeutsch is not disconnected on the outside either, but fully integrated within the general domain of German, an integration that defies a distinction of ‘‘autochthonous’’ and ‘‘allochthonous’’ German, not only at the level of speakers, but also at the level of linguistic systems.
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.