Refine
Year of publication
- 2021 (2) (remove)
Document Type
- Part of a Book (2) (remove)
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- yes (2)
Keywords
- Computerlinguistik (1)
- Datenbank (1)
- Digital Humanities (1)
- Enzyklopädie (1)
- Forschungsdaten (1)
- Forschungseinrichtung (1)
- IT infrastructure (1)
- Infrastruktur (1)
- Korpus <Linguistik> (1)
- Semasiologie (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (1)
- Peer-Review (1)
This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.
Digital research infrastructures can be divided into four categories: large equipment, IT infrastructure, social infrastructure, and information infrastructure. Modern research institutions often employ both IT infrastructure and information infrastructure, such as databases or large-scale research data. In addition, information infrastructure depends to some extent on IT infrastructure. In this paper, we discuss the IT, information, and legal infrastructure issues that research institutions face.