TY - CHAP U1 - Konferenzveröffentlichung A1 - Rehbein, Ines A1 - van Genabith, Josef ED - De Smedt, Koenraad ED - Hajič, Jan ED - Kübler, Sandra T1 - Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited T2 - The Sixth International Workshop on Treebanks and Linguistic Theories (TLT ‘07). Bergen, Norway. December 7–8, 2007 N2 - This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes. T3 - NEALT Proceedings Series - 1 KW - Korpus KW - Syntaktische Analyse KW - Annotation KW - treebanks Y1 - 2007 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-57822 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-57822 UR - http://doras.dcu.ie/15264/ SN - 1736-6305 SS - 1736-6305 SP - 115 EP - 126 PB - Northern European Association for Language Technology CY - Tartu ER -