A word embedding approach to onomasiological search in multilingual loanword lexicography
- In this paper we present an experimental semantic search function, based on word embeddings, for an integrated online information system on German lexical borrowings into other languages, the Lehnwortportal Deutsch (LWPD). The LWPD synthesizes an increasing number of lexicographical resources and provides basic cross-resource search options. Onomasiological access to the lexical units of the portal is a highly desirable feature for many research questions, such as the likelihood of borrowing lexical units with a given meaning (Haspelmath & Tadmor, 2009; Zeller, 2015). The search technology is based on multilingual pre-trained word embeddings, and individual word senses in the portal are associated with word vectors. Users may select one or more among a very large number of search terms, and the database returns lexical items with word sense vectors similar to these terms. We give a preliminary assessment of the feasibility, usability and efficacy of our approach, in particular in comparison to search options based on semantic domains or fields.
Author: | Peter MeyerORCiDGND, Ngoc Duyen Tanja TuORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-106840 |
URL: | https://elex.link/elex2021/wp-content/uploads/eLex_2021-proceedings_compressed.pdf |
ISSN: | 2533-5626 |
Parent Title (English): | Electronic lexicography in the 21st century: post-editing lexicography. Proceedings of the eLex 2021 conference. 5–7 July 2021, virtual. |
Publisher: | Lexical Computing CZ s.r.o. |
Place of publication: | Brno |
Editor: | Iztok Kosem, Michal Cukr, Miloš Jakubíček, Jelena Kallas, Simon Krek, Carole Tiberius |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2021 |
Date of Publication (online): | 2021/09/23 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Lehnwortportal Deutsch (LWPD) lexical borrowings; multilingual lexicography; onomasiological search; word embeddings |
GND Keyword: | Computerunterstützte Lexikografie; Datenbank; Lehnwort; Lexikografie; Mehrsprachigkeit; Onomasiologie; Semantik |
First Page: | 78 |
Last Page: | 91 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
BDSL-Classification: | Lexikographie, Wörterbücher |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Lexikografie |
Program areas: | L1: Lexikographie und Sprachdokumentation |
Program areas: | L3: Lexik empirisch und digital |
Licence (English): | Creative Commons - Attribution-ShareAlike 4.0 International |