Do FreeWord Order Languages Need More Treebank Data? Investigating Dative Alternation in German, English, and Russian
- We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
Author: | Daniel Dakota, Timur Gilmanov, Wen Li, Christopher Kuzma, Evgeny Kim, Noor Abo Mokh, Sandra Kübler |
---|---|
URN: | urn:nbn:de:bsz:mh39-61847 |
URL: | http://www.spmrl.org/accepted2015.html |
Parent Title (English): | Proceedings of the 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015). July 23rd in Bilbao, Basque Country, Spain |
Editor: | Marie Candito, Jinho Choi, Yannick Versley |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2015 |
Date of Publication (online): | 2017/05/23 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Treebanks |
GND Keyword: | Dativ; Deutsch; Englisch; Russisch; Syntaktische Analyse; Wortstellung |
First Page: | 14 |
Last Page: | 20 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | Urheberrechtlich geschützt |