Discovering Subtle Word Relations in Large German Corpora
- With an increasing amount of text data available it is possible to automatically extract a variety of information about language. One way to obtain knowledge about subtle relations and analogies between words is to observe words which are used in the same context. Recently, Mikolov et al. proposed a method to efficiently compute Euclidean word representations which seem to capture subtle relations and analogies between words in the English language. We demonstrate that this method also captures analogies in the German language. Furthermore, we show that we can transfer information extracted from large non-annotated corpora into small annotated corpora, which are then, in turn, used for training NLP systems.
Author: | Sebastian Buschjäger, Lukas Pfahler, Katharina Morik |
---|---|
URN: | urn:nbn:de:bsz:mh39-38317 |
Parent Title (English): | Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), Lancaster, 20 July 2015 |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Hanno Biber, Evelyn Breiteneder, Marc Kupietz, Harald Lüngen, Andreas Witt |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2015 |
Date of Publication (online): | 2015/07/02 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Corpus annotation; Corpus linguistics; Corpus management; Corpus technology; Large corpora; National corpus |
GND Keyword: | Annotation; Datenbanksystem; Korpus <Linguistik> |
First Page: | 11 |
Last Page: | 14 |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-3 / 3rd Workshop on Challenges in the Management of Large Corpora |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |