OPUS 4 | Search

5 search hits

1 to 5

Sort by

Year
Year
Title
Title
Author
Author

Projektvorstellung – Redewiedergabe. Eine literatur- und sprachwissenschaftliche Korpusanalyse (2018)

Brunner, Annelen ; Engelberg, Stefan ; Jannidis, Fotis ; Tu, Ngoc Duyen Tanja ; Weimer, Lukas

Das laufende DFG-Projekt „Redewiedergabe“ stellt einen Anwendungsfall quantitativer Sprach-und Literaturwissenschaft dar und beschäftigt sich mit dem Phänomen „Redewiedergabe“ auf der Grundlage großer Datenmengen. Zu diesem Zweck wird zum einen ein Korpus manuell mit Redewiedergabeformen annotiert, zum anderen werden Verfahren zur automatischen Erkennung des Phänomens entwickelt. Ziel ist es, Forschungsfragen nach der Entwicklung von Redewiedergabe vor allem im 19. Jahrhundert zu beantworten.

Deep learning for free indirect representation (2019)

Brunner, Annelen ; Tu, Ngoc Duyen Tanja ; Weimer, Lukas ; Jannidis, Fotis

In this paper, we present our work-inprogress to automatically identify free indirect representation (FI), a type of thought representation used in literary texts. With a deep learning approach using contextual string embeddings, we achieve f1 scores between 0.45 and 0.5 (sentence-based evaluation for the FI category) on two very different German corpora, a clear improvement on earlier attempts for this task. We show how consistently marked direct speech can help in this task. In our evaluation, we also consider human inter-annotator scores and thus address measures of certainty for this difficult phenomenon.

Annotationsrichtlinien des Projekts "Redewiedergabe. Eine literatur- und sprachwissenschaftliche Korpusanalyse" (2020)

Brunner, Annelen ; Weimer, Lukas ; Engelberg, Stefan ; Jannidis, Fotis ; Tu, Ngoc Duyen Tanja

Word sense alignment and disambiguation for historical encyclopedias (2021)

Hagen, Thora ; Jannidis, Fotis ; Witt, Andreas

This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.

Twenty-two historical encyclopedias encoded in TEI: a new resource for the Digital Humanities (2020)

Hagen, Thora ; Ketzan, Erik ; Jannidis, Fotis ; Witt, Andreas

This paper accompanies the corpus publication of EncycNet, a novel XML/TEI annotated corpus of 22 historical German encyclopedias from the early 18th to early 20th century. We describe the creation and annotation of the corpus, including the rationale for its development, suggested methodology for TEI annotation, possible use cases and future work. While many well-developed annotation standards for lexical resources exist, none can adequately model the encyclopedias at hand, and we therefore suggest how the TEI Lex-0 standard may be modified with additional guidelines for the annotation of historical encyclopedias. As the digitization and annotation of historical encyclopedias are settling on TEI as the de facto standard, our methodology may inform similar projects.

1 to 5

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

5 search hits