Volltext-Downloads (blau) und Frontdoor-Views (grau)

Discussing best practices for the annotation of Twitter microtext

  • This paper contributes to the discussion on best practices for the syntactic analysis of non-canonical language, focusing on Twitter microtext. We present an annotation experiment where we test an existing POS tagset, the Stuttgart-Tübingen Tagset (STTS), with respect to its applicability for annotating new text from the social media, in particular from Twitter microblogs. We discuss different tagset extensions proposed in the literature and test our extended tagset on a set of 506 tweets (7.418 tokens) where we achieve an inter-annotator agreement for two human annotators in the range of 92.7 to 94.4 (k). Our error analysis shows that especially the annotation of Twitterspecific phenomena such as hashtags and at-mentions causes disagreements between the human annotators. Following up on this, we provide a discussion of the different uses of the @- and #-marker in Twitter and argue against analysing both on the POS level by means of an at-mention or hashtag label. Instead, we sketch a syntactic analysis which describes these phenomena by means of syntactic categories and grammatical functions.

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Ines Rehbein, Emiel Visser, Nadine Lestmann
Parent Title (English):Proceedings of The Third Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3). 12 December 2013. Sofia, Bulgaria
Publisher:Bulgarian Academy of Sciences
Place of publication:Sofia
Editor:Francesco Mambrini, Marco Passarotti, Caroline Sporleder
Document Type:Conference Proceeding
Year of first Publication:2013
Date of Publication (online):2016/11/21
GND Keyword:Annotation; Syntaktische Analyse; Twitter <Softwareplattform>
First Page:73
Last Page:84
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (German):License LogoUrheberrechtlich geschützt