OPUS 4 | 400 Sprache, Linguistik

400 Sprache, Linguistik

400 Sprache (135)
401 Sprachphilosophie, Sprachtheorie (2)
402 Verschiedenes
403 Wörterbücher, Enzyklopädien
404 Spezielle Themen (1)
405 Fortlaufende Sammelwerke
406 Organisationen, Management
407 Ausbildung, Forschung, verwandte Themen (1)
408 Behandlung nach Personengruppen
409 Geografische, personenbezogene Behandlung

Refine

Has Fulltext

yes (7)

Keywords

word embeddings (7) (remove)

7 search hits

1 to 7

Sort by

A distributional comparison between FOLK and DeReKo (2023)

Kupietz, Marc ; Fankhauser, Peter ; Ruppenhofer, Josef

Semantische Suche mit Word Embeddings für ein mehrsprachiges Wörterbuchportal (2022)

Tu, Ngoc Duyen Tanja ; Meyer, Peter

Das Lehnwortportal Deutsch (LWPD) ist ein Online-Informationssystem zu Entlehnungen von Wörtern aus dem Deutschen in andere Sprachen. Es beruht auf einer wachsenden Zahl von lexikographischen Ressourcen zu verschiedenen Sprachen und bietet eine einfache ressourcenübergreifende Suchfunktion an. Das Poster präsentiert eine derzeit in Entwicklung befindliche onomasiologische Suchfunktion für das LWPD.

Semantic author name disambiguation with word embeddings (2017)

Müller, Mark-Christoph

We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.

Off-the-shelf semantic author name disambiguation for bibliographic data bases (2019)

Müller, Mark-Christoph ; Bannister, Adam ; Reitz, Florian

The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.

On the contribution of word-level semantics to practical author name disambiguation (2018)

Müller, Mark-Christoph

We demonstrate the utility of word embedding-based semantic similarity methods for Author Name Disambiguation.

Count-based and predictive language models for exploring DeReKo (2022)

Fankhauser, Peter ; Kupietz, Marc

We present the use of count-based and predictive language models for exploring language use in the German Reference Corpus DeReKo. For collocation analysis along the syntagmatic axis we employ traditional association measures based on co-occurrence counts as well as predictive association measures derived from the output weights of skipgram word embeddings. For inspecting the semantic neighbourhood of words along the paradigmatic axis we visualize the high dimensional word embeddings in two dimensions using t-stochastic neighbourhood embeddings. Together, these visualizations provide a complementary, explorative approach to analysing very large corpora in addition to corpus querying. Moreover, we discuss count-based and predictive models w.r.t. scalability and maintainability in very large corpora.

A word embedding approach to onomasiological search in multilingual loanword lexicography (2021)

Meyer, Peter ; Tu, Ngoc Duyen Tanja

In this paper we present an experimental semantic search function, based on word embeddings, for an integrated online information system on German lexical borrowings into other languages, the Lehnwortportal Deutsch (LWPD). The LWPD synthesizes an increasing number of lexicographical resources and provides basic cross-resource search options. Onomasiological access to the lexical units of the portal is a highly desirable feature for many research questions, such as the likelihood of borrowing lexical units with a given meaning (Haspelmath & Tadmor, 2009; Zeller, 2015). The search technology is based on multilingual pre-trained word embeddings, and individual word senses in the portal are associated with word vectors. Users may select one or more among a very large number of search terms, and the database returns lexical items with word sense vectors similar to these terms. We give a preliminary assessment of the feasibility, usability and efficacy of our approach, in particular in comparison to search options based on semantic domains or fields.

1 to 7

Open Access

400 Sprache, Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

7 search hits