Brown clustering for unlexicalized parsing
- Brown clustering has been used to help increase parsing performance for morphologically rich languages. However, much of the work has focused on using clustering techniques to replace terminal nodes or as a feature for parsing. Instead, we choose to examine how effectively Brown clustering is for unlexicalized parsing by creating data-driven POS tagsets which are then used with the Berkeley parser. We investigate cluster sizes as well as on what information (e.g. words vs. lemmas) clustering will yield the best parser performance. Our results approach the current state of the art results for the German T¨uBa-D/Z treebank when using parser internal tagging.
Author: | Daniel Dakota |
---|---|
URN: | urn:nbn:de:bsz:mh39-61818 |
URL: | https://www.linguistics.rub.de/bla/ |
ISSN: | 2190-0949 |
Parent Title (English): | Proceedings of the 13th Conference on Natural Language Processing (KONVENS) Bochum, Germany September 19–21, 2016 |
Series (Serial Number): | Bochumer Linguistische Arbeitsberichte (16) |
Publisher: | Ruhr-Universität Bochum |
Place of publication: | Bochum |
Translator: | Stefanie Dipper, Friedrich Neubarth, Heike Zinsmeister |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2016 |
Date of Publication (online): | 2017/05/23 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Brown clustering |
GND Keyword: | Automatische Sprachanalyse;; Cluster <Datenanalyse>; Deutsch |
First Page: | 68 |
Last Page: | 77 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Linguistics-Classification: | Korpuslinguistik |
Licence (German): | Urheberrechtlich geschützt |