Refine
Document Type
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Deutsch (2) (remove)
Publicationstate
Reviewstate
- Peer-Review (2)
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
We investigate how the granularity of POS tags influences POS tagging, and furthermore, how POS tagging performance relates to parsing results. For this, we use the standard “pipeline” approach, in which a parser builds its output on previously tagged input. The experiments are performed on two German treebanks, using three POS tagsets of different granularity, and six different POS taggers, together with the Berkeley parser. Our findings show that less granularity of the POS tagset leads to better tagging results. However, both too coarse-grained and too fine-grained distinctions on POS level decrease parsing performance.