Semantic author name disambiguation with word embeddings
- We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.
Author: | Mark-Christoph MüllerORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-111355 |
DOI: | https://doi.org/10.1007/978-3-319-67008-9_24 |
ISBN: | 978-3-319-67008-9 |
ISSN: | 1611-3349 |
Parent Title (English): | Research and Advanced Technology for Digital Libraries. 21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017, Thessaloniki, Greece, September 18-21, 2017, Proceedings |
Series (Serial Number): | Lecture Notes in Computer Science (10450) |
Publisher: | Springer |
Place of publication: | Cham |
Editor: | Jaap Kamps, Giannis Tsakonas, Yannis Manolopoulos, Lazaros Iliadis, Ioannis Karydis |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2022/07/18 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung] |
Publicationstate: | Zweitveröffentlichung |
Publicationstate: | Postprint |
Reviewstate: | Peer-Review |
Tag: | author name disambiguation; classification; clustering; deep learning; machine learning; semantic similarity; word embeddings |
GND Keyword: | Computerlinguistik; Deep learning; Maschinelles Lernen; Semantik; Veröffentlichung |
First Page: | 300 |
Last Page: | 311 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |