Refine
Document Type
- Conference Proceeding (2)
- Article (1)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Annotation (2)
- Angewandte Linguistik (1)
- Annotation guidelines (1)
- Datenbanksystem (1)
- Datenverarbeitung (1)
- Korpus <Linguistik> (1)
- Social Media (1)
- Standardisierung (1)
- Strukturbaum (1)
- Syntax (1)
- Texttechnologie (1)
- Treebanks (1)
- UGC (1)
- Universal Dependencies (1)
- Web (1)
- World Wide Web (1)
- XML (1)
Publicationstate
- Veröffentlichungsversion (3) (remove)
Reviewstate
- Peer-Review (2)
Publisher
- Ediçoes Colibri (1)
- Springer (1)
- Universität Tübingen (1)
In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
This paper presents the application of the <tiger2/> format to various linguistic scenarios with the aim of making it the standard serialisation for the ISO 24615 [1] (SynAF) standard. After outlining the main characteristics of both the SynAF metamodel and the <tiger2/> format, as extended from the initial Tiger XML format [2], we show through a range of different language families how <tiger2/> covers a variety of constituency and dependency based analyses.