Volltext-Downloads (blau) und Frontdoor-Views (grau)

What do we need to know about an unknown word when parsing German

  • We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Bich-Ngoc Do, Ines Rehbein, Anette Frank
URN:urn:nbn:de:bsz:mh39-80244
URL:http://www.aclweb.org/anthology/W17-4117
URL:https://aclanthology.info/papers/W17-4100/w17-4100
ISBN:978-1-945626-91-3
Parent Title (English):Proceedings of the First Workshop on Subword and Character Level Models in NLP (EMNLP 2017). September 7, 2017 Copenhagen, Denmark
Publisher:The Association for Computational Linguistics
Place of publication:Stroudsburg PA, USA
Document Type:Conference Proceeding
Language:English
Year of first Publication:2017
Date of Publication (online):2018/10/02
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:part-of-speech (POS)
GND Keyword:Automatische Spracherkennung; Deutsch; Kompositum
First Page:117
Last Page:123
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Computerlinguistik
Program areas:Digitale Sprachwissenschaft
Licence (German):License LogoCreative Commons - Namensnennung 4.0 International