Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies
- The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.
Author: | Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines RehbeinGND, Josef RuppenhoferGND, Djamé Seddah, Amir ZeldesGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-98686 |
URL: | http://www.lrec-conf.org/proceedings/lrec2020/index.html#5240 |
ISBN: | 979-10-95546-34-4 |
Parent Title (English): | Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), May 11-16, 2020, Palais du Pharo, Marseille, France |
Publisher: | European Language Resources Association |
Place of publication: | Paris |
Editor: | Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2020 |
Date of Publication (online): | 2020/06/01 |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | Peer-Review |
Tag: | UGC; Universal Dependencies; Web; annotation guidelines; treebanks |
GND Keyword: | Annotation; Natürliche Sprache; Social Media; Strukturbaum; User Generated Content |
First Page: | 5240 |
Last Page: | 5250 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Linguistics-Classification: | Korpuslinguistik |
Program areas: | P2: Mündliche Korpora |
Licence (English): | ![]() |