Volltext-Downloads (blau) und Frontdoor-Views (grau)

Data sets for author name disambiguation: an empirical analysis and a new resource

  • Data sets of publication meta data with manually disambiguated author names play an important role in current author name disambiguation (AND) research. We review the most important data sets used so far, and compare their respective advantages and shortcomings. From the results of this review, we derive a set of general requirements to future AND data sets. These include both trivial requirements, like absence of errors and preservation of author order, and more substantial ones, like full disambiguation and adequate representation of publications with a small number of authors and highly variable author names. On the basis of these requirements, we create and make publicly available a new AND data set, SCAD-zbMATH. Both the quantitative analysis of this data set and the results of our initial AND experiments with a naive baseline algorithm show the SCAD-zbMATH data set to be considerably different from existing ones. We consider it a useful new resource that will challenge the state of the art in AND and benefit the AND research community.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Mark-Christoph MüllerORCiDGND, Florian ReitzORCiD, Nicolas RoyORCiDGND
URN:urn:nbn:de:bsz:mh39-110871
DOI:https://doi.org/10.1007/s11192-017-2363-5
ISSN:1588-2861
Parent Title (English):Scientometrics
Publisher:Springer Nature
Place of publication:Berlin
Document Type:Article
Language:English
Year of first Publication:2017
Date of Publication (online):2022/06/14
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:SCAD-zbMATH; author name disambiguation; author name homography; author name variability; data sets; digital libraries
GND Keyword:Autor; Datensatz; Elektronische Bibliothek; Empirische Forschung; Homographie; Metadaten; Quantitative Analyse; Veröffentlichung
Volume:111
Issue:3
First Page:1467
Last Page:1500
DDC classes:400 Sprache / 400 Sprache, Linguistik
DDC classes:000 Allgemeines, Informatik, Informationswissenschaft / 020 Bibliotheks- und Informationswissenschaft / 020 Bibliotheks- und Informationswissenschaften
Open Access?:ja
Licence (English):License LogoCreative Commons - Attribution 4.0 International