Volltext-Downloads (blau) und Frontdoor-Views (grau)

Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies

  • The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines RehbeinGND, Josef RuppenhoferGND, Djamé Seddah, Amir ZeldesGND
URN:urn:nbn:de:bsz:mh39-98686
URL:http://www.lrec-conf.org/proceedings/lrec2020/index.html#5240
ISBN:979-10-95546-34-4
Parent Title (English):Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), May 11-16, 2020, Palais du Pharo, Marseille, France
Publisher:European Language Resources Association
Place of publication:Paris
Editor:Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Document Type:Conference Proceeding
Language:English
Year of first Publication:2020
Date of Publication (online):2020/06/01
Publicationstate:Zweitveröffentlichung
Reviewstate:Peer-Review
Tag:UGC; Universal Dependencies; Web; annotation guidelines; treebanks
GND Keyword:Annotation; Natürliche Sprache; Social Media; Strukturbaum; User Generated Content
First Page:5240
Last Page:5250
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Computerlinguistik
Linguistics-Classification:Korpuslinguistik
Licence (English):License LogoCreative Commons - Attribution-NonCommercial 4.0 International