Refine
Document Type
- Article (5) (remove)
Has Fulltext
- yes (5)
Keywords
- Annotation (2)
- Jugendsprache (2)
- Korpus <Linguistik> (2)
- Angewandte Linguistik (1)
- Annotation guidelines (1)
- Automatische Sprachverarbeitung (1)
- Computerlinguistik (1)
- Covariation (1)
- Datenbanksystem (1)
- Dependenzgrammatik (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (2)
- Peer-Review (2)
Publisher
- Springer (2)
- Elsevier (1)
- GSCL (1)
- de Gruyter (1)
In 1959, Lucien Tesnière wrote his main work Éléments de syntaxe structurale. While the impact on theoretical linguistics was not very strong at first, 50 years later there exist a variety of linguistic theories based on Tesnière's work. In computational linguistics, as in theoretical linguistics, dependency grammar was not very influential at first. The last 10–15 years, however, have brought a noticeable change and dependency grammar has found its way into computational linguistics. Syntactically annotated corpora based on dependency representations are available for a variety of languages, as well as statistical parsers which give a syntactic analysis of running text describing the underlying dependency relations between word tokens in the text. This article gives an overview of relevant areas of computational linguistics which have been influenced by dependency grammar. It discusses the pros and cons of different types of syntactic representation used in natural language processing and their suitability as representations of meaning. Finally, an attempt is made to give an outlook on the future impact of dependency grammar on computational linguistics.
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.
This paper investigates evidence for linguistic coherence in new urban dialects that evolved in multiethnic and multilingual urban neighbourhoods. We propose a view of coherence as an interpretation of empirical observations rather than something that would be ‘‘out there in the data’’, and argue that this interpretation should be based on evidence of systematic links between linguistic phenomena, as established by patterns of covariation between phenomena that can be shown to be related at linguistic levels. In a case study, we present results from qualitative and quantitative analyses for a set of phenomena that have been described for Kiezdeutsch, a new dialect from multilingual urban Germany. Qualitative analyses point to linguistic relationships between different phenomena and between pragmatic and linguistic levels. Quantitative analyses, based on corpus data from KiDKo (www.kiezdeutschkorpus.de), point to systematic advantages for the Kiezdeutsch data from a multiethnic and multilingual context provided by the main corpus (KiDKo/Mu), compared to complementary corpus data from a mostly monoethnic and monolingual (German) context (KiDKo/Mo). Taken together, this indicates patterns of covariation that support an interpretation of coherence for this new dialect: our findings point to an interconnected linguistic system, rather than to a mere accumulation of individual features. In addition to this internal coherence, the data also points to external coherence: Kiezdeutsch is not disconnected on the outside either, but fully integrated within the general domain of German, an integration that defies a distinction of ‘‘autochthonous’’ and ‘‘allochthonous’’ German, not only at the level of speakers, but also at the level of linguistic systems.
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.