Volltext-Downloads (blau) und Frontdoor-Views (grau)

Word sense alignment and disambiguation for historical encyclopedias

  • This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Thora HagenORCiD, Fotis JannidisORCiDGND, Andreas WittORCiDGND
URN:urn:nbn:de:bsz:mh39-109834
URL:https://graphentechnologien.hypotheses.org/files/2022/01/Word_Sense_Alignment_and_Disambiguation_for_Historical_etc-Hagen_Jannidis_Witt.pdf
Parent Title (English):Graphs and Networks in the Humanities 2022. Technologies, Models, Analyses, and Visualizations. 6th International Conference, 3. – 4. February 2022, Online.
Publisher:Graphen & Netzwerke; AG des Verbandes Digital Humanities im deutschsprachigen Raum e.V.
Place of publication:Gießen
Editor:Tara Andrews, Franziska Diehr, Thomas Efer, Andreas Kuczera, Joris van Zundert
Document Type:Part of a Book
Language:English
Year of first Publication:2021
Date of Publication (online):2022/03/24
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:disambiguation; historical encyclopedias; word sense alignment
GND Keyword:Computerlinguistik; Enzyklopädie; Korpus <Linguistik>; Semasiologie; Wikipedia; Wissensgraph
Page Number:7
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Computerlinguistik
Program areas:S2: Forschungskoordination und –infrastrukturen
Licence (English):License LogoCreative Commons - Attribution 4.0 International