Volltext-Downloads (blau) und Frontdoor-Views (grau)

Transparent, efficient, and robust word embedding access with WOMBAT

  • We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Export metadata

Additional Services

Search Google Scholar


Author:Mark-Christoph MüllerORCiDGND, Michael StrubeGND
Parent Title (English):Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations. August 20-26, 2018, Santa Fe, New Mexico, USA
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg, Pennsylvania
Editor:Dongyan Zhao
Document Type:Conference Proceeding
Year of first Publication:2018
Date of Publication (online):2022/06/14
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Tag:WOrd eMBedding dATabase (WOMBAT); word embedding
GND Keyword:Automatische Sprachanalyse; Code; Computerlinguistik; Python <Programmiersprache>
First Page:53
Last Page:57
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (English):License LogoCreative Commons - Attribution 4.0 International