Evaluating Evaluation Measures

This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the TüBa-D/Z annotation scheme is more adequate then the TIGER scheme for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality.

Metadaten
Author:	Ines Rehbein, Josef van Genabith
URN:	urn:nbn:de:bsz:mh39-57543
ISBN:	978-9985-4-0513-0
Parent Title (English):	Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA-2007). University of Tartu, Tartu. May 24-26, 2007
Publisher:	University of Tartu
Place of publication:	Tartu
Editor:	Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek, Mare Koit
Document Type:	Conference Proceeding
Language:	English
Year of first Publication:	2007
Date of Publication (online):	2017/01/09
Publicationstate:	Veröffentlichungsversion
Reviewstate:	(Verlags)-Lektorat
GND Keyword:	Deutsch; Korpus <Linguistik>; Syntaktische Analyse
First Page:	372
Last Page:	379
DDC classes:	400 Sprache / 400 Sprache, Linguistik
Open Access?:	ja
Linguistics-Classification:	Computerlinguistik
Linguistics-Classification:	Korpuslinguistik
Licence (German):	Urheberrechtlich geschützt

Open Access