Analyzing domain specific word embeddings for a large corpus of contemporary German. International Corpus Linguistics Conference, Cardiff, Wales, UK, July 22-26, 2019
- Distributional models of word use constitute an indispensable tool in corpus based lexicological research for discovering paradigmatic relations and syntagmatic patterns (Belica et al. 2010). Recently, word embeddings (Mikolov et al. 2013) have revived the field by allowing to construct and analyze distributional models on very large corpora. This is accomplished by reducing the very high dimensionality of word cooccurrence contexts, the size of the vocabulary, to few dimensions, such as 100-200. However, word use and meaning can vary widely along dimensions such as domain, register, and time, and word embeddings tend to represent only the most prevalent meaning. In this paper we thus construct domain specific word embeddings to allow for systematically analyzing variations in word use. Moreover, we also demonstrate how to reconstruct domain specific co-occurrence contexts from the dense word embeddings.
Author: | Peter Fankhauser, Marc KupietzGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-91174 |
DOI: | https://doi.org/10.14618/ids-pub-9117 |
Publisher: | Leibniz-Institut für Deutsche Sprache (IDS) |
Place of publication: | Mannheim |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2019 |
Date of Publication (online): | 2019/08/06 |
Publicationstate: | Postprint |
Reviewstate: | Peer-Review |
GND Keyword: | Automatische Sprachanalyse; Deutsch; Korpus <Linguistik>; Phrase <Syntagma> |
Page Number: | 6 |
Note: | Konferenzbeitrag: International Corpus Linguistics Conference, Cardiff, Wales, UK, July 22-26, 2019 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Program areas: | Digitale Sprachwissenschaft |
Licence (German): | Creative Commons - CC BY - Namensnennung 4.0 International |