OPUS 4 | Search

99 search hits

1 to 10

Sort by

XML in der Praxis: Dokument parsen, validieren und verarbeiten (2003)

Köhler, Werner ; Witt, Andreas ; Sasaki, Felix ; Milde, Jan-Torsten ; Pönninghaus, Jens

WorldViews: Access to international textbooks for digital humanities researchers (2017)

Hennicke, Steffen ; Stahn, Lena-Luise ; De Luca, Ernesto William ; Schwedes, Kerstin ; Witt, Andreas

This paper introduces the field of international textbook research and discusses how the WorldViews project is working towards enhanced access to textbook resources for digital humanities research.

Word sense alignment and disambiguation for historical encyclopedias (2021)

Hagen, Thora ; Jannidis, Fotis ; Witt, Andreas

This paper will address the challenge of creating a knowledge graph from a corpus of historical encyclopedias with a special focus on word sense alignment (WSA) and disambiguation (WSD). More precisely, we examine WSA and WSD approaches based on article similarity to link messy historical data, utilizing Wikipedia as aground-truth component – as the lack of a critical overlap in content paired with the amount of variation between and within the encyclopedias does not allow for choosing a ”baseline” encyclopedia to align the others to. Additionally, we are comparing the disambiguation performance of conservative methods like the Lesk algorithm to more recent approaches, i.e. using language models to disambiguate senses.

Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren (2023)

Präsentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Die Präsentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.

Verknüpfung heterogener texttechnologischer Ressourcen (2005)

Goecke, Daniela ; Metzing, Dieter ; Witt, Andreas

Gegenstand des Workshop-Beitrags ist die Verknüpfung heterogener linguistischer Ressourcen. Eine bedeutende Teilmenge von Ressourcen in der gegenwärtigen linguistischen Forschung und Anwendung besteht zum einen aus XML-annotierten Textdokumenten und zum anderen aus externen Ressourcen wie Grammatiken, Lexika oder Ontologien. Es wird eine Architektur vorgestellt, die eine Integration heterogener Ressourcen erlaubt, wobei die Methoden zur Integration unabhängig von der jeweiligen Anwendung sind und somit verschiedene Verknüpfungen ermöglichen. Eine exemplarische Anwendung der Methodologie ist die Analyse anaphorischer Beziehungen.

Verbundprojekt: TextTransfer (Pilot) - Korpusgestützte Erkennung von Verwertungsmustern in wissenschaftlichen Texten. Abschlussbericht Gesamtprojekt nach Nr. 3.2. BNBest-BMBF 98 (2020)

Witt, Andreas

Die zentrale Aufgabenstellung des Verbundprojektes TextTransfer (Pilot) war eine Machbarkeitsprüfung für die Entwicklung eines Text-Mining-Verfahrens, mit dem Forschungsergebnisse automatisiert auf Hinweise zu Transfer- und Impactpotenzialen untersucht werden können. Das vom Projektkoordinator IDS verantwortete Teilprojekt konzentrierte sich dabei auf die Entwicklung der methodischen Grundlagen, während der Projektpartner TIB vornehmlich für die Bereitstellung eines geeigneten Datensatzes verantwortlich war. Solchen automatisierten Verfahren liegen zumeist textbasierte Daten als physisches Manifest wissenschaftlicher Erkenntnisse zugrunde, die im Falle von TextTransfer (Pilot) als empirische Grundlage herangezogen wurden. Das im Verbund zur Anwendung gebrachte maschinelle Lernverfahren stützte sich ausschließlich auf deutschsprachige Projektendberichte öffentlich geförderter Forschung. Diese Textgattung eignet sich insbesondere hinsichtlich ihrer öffentlichen Verfügbarkeit bei zuständigen Gedächtnisorganisationen und aufgrund ihrer im Vergleich zu anderen Formaten wissenschaftlicher Publikation relativen strukturellen wie sprachlichen Homogenität. TextTransfer (Pilot) ging daher grundsätzlich von der Annahme struktureller bzw. sprachlicher Ähnlichkeit in Berichtstexten aus, bei denen der Nachweis tatsächlich erfolgten Transfers zu erbringen war. Im Folgenden wird in diesen Fällen von Texten bzw. textgebundenen Forschungsergebnissen mit Transfer- und Impactpotenzial gesprochen werden. Es wurde ferner postuliert, dass sich diese Indizien von sprachlichen Eigenschaften in Texten zu Projekten ohne nachzuweisenden bzw. ggf. auch niemals erfolgtem, aber potenziell möglichem Transfer oder Impact unterscheiden lassen. Mit einer Verifizierung dieser Annahmen war es möglich, Transfer- oder Impactwahrscheinlichkeiten in großen Mengen von Berichtsdaten ohne eingehende Lektüre zu prognostizieren.

Unification of XML Documents with Concurrent Markup (2004)

Witt, Andreas ; Lüngen, Harald ; Sasaki, Felix ; Goecke, Daniela

Twenty-two historical encyclopedias encoded in TEI: a new resource for the Digital Humanities (2020)

Hagen, Thora ; Ketzan, Erik ; Jannidis, Fotis ; Witt, Andreas

This paper accompanies the corpus publication of EncycNet, a novel XML/TEI annotated corpus of 22 historical German encyclopedias from the early 18th to early 20th century. We describe the creation and annotation of the corpus, including the rationale for its development, suggested methodology for TEI annotation, possible use cases and future work. While many well-developed annotation standards for lexical resources exist, none can adequately model the encyclopedias at hand, and we therefore suggest how the TEI Lex-0 standard may be modified with additional guidelines for the annotation of historical encyclopedias. As the digitization and annotation of historical encyclopedias are settling on TEI as the de facto standard, our methodology may inform similar projects.

Towards validation of concurrent markup (2006)

Schonefeld, Oliver ; Witt, Andreas

XCONCUR allows for the annotations of multiple concurrent hierarchies, but lacks cross-layer validation. This paper explores the requirements for a constraint-based approach for such a validation process.

The New IDS Corpus Analysis Platform: Challenges and Prospects (2012)

Bański, Piotr ; Fischer, Peter M. ; Frick, Elena ; Ketzan, Erik ; Kupietz, Marc ; Schnober, Carsten ; Schonefeld, Oliver ; Witt, Andreas

The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

99 search hits