Volltext-Downloads (blau) und Frontdoor-Views (grau)

Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited

  • This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Ines Rehbein, Josef van Genabith
URN:urn:nbn:de:bsz:mh39-57822
URL:http://doras.dcu.ie/15264/
ISSN:1736-6305
Parent Title (English):The Sixth International Workshop on Treebanks and Linguistic Theories (TLT ‘07). Bergen, Norway. December 7–8, 2007
Series (Serial Number):NEALT Proceedings Series (1)
Publisher:Northern European Association for Language Technology
Place of publication:Tartu
Editor:Koenraad De Smedt, Jan Hajič, Sandra Kübler
Document Type:Conference Proceeding
Language:English
Year of first Publication:2007
Date of Publication (online):2017/01/13
Publicationstate:Veröffentlichungsversion
Tag:treebanks
GND Keyword:Annotation; Korpus <Linguistik>; Syntaktische Analyse
First Page:115
Last Page:126
Dewey Decimal Classification:400 Sprache / 400 Sprache, Linguistik
BDSL-Classification:Grammatik
Linguistics-Classification:Computerlinguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):License LogoCreative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland