Volltext-Downloads (blau) und Frontdoor-Views (grau)

Making great work even better. Appraisal and digital curation of widely dispersed electronic textual resources (c. 15th-19th centuries) in CLARIN-D

  • Numerous high-quality primary textual resources - in the context of this paper, this means fulltext transcriptions (and corresponding image scans) of German texts originating from the 15th to the 19th century - are scattered among the web or stored remotely on institutional or private servers. They are often filed on degrading recording media and are encoded in out-of-date or inflexible storage formats. Often, textual resources are accompanied by scarce, insufficient or inaccurate bibliographic information, which is only one further reason why valuable resources, even if available on the web, remain undiscovered. Additionally, idiosyncratic, project-specific markup conventions often hinder further usage and analysis of the data. Because of these and other problems, a great amount of the abovementioned transcriptions of historical sources can hardly be found, let alone accessed by third parties, and are of little use to the wider research community. This situation is unsatisfying from the perspective of a (corpus-)linguistic project like the one described here, but also from the perspective of any text-based research in the humanities and social sciences. The integration of as many of these ‘dispersed’ high-quality primary textual resources as possible into an encompassing repository like the sustainable, web and centres-based research infrastructure of CLARIN-D1 2 is an important step and at least a necessary prerequisite to solve this problem. This paper summarizes the work of an 18-month project funded by the German Federal Ministry of Education and Research (BMBF) which dealt with the curation and integration of historical text resources of the 15th-19th century into the CLARIN-D infrastructure.

Download full text files

Export metadata

Additional Services

Search Google Scholar


Author:Christian Thomas, Frank Wiegand
Parent Title (English):Historical corpora. Challenges and perspectives
Series (Serial Number):Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache | Corpus Linguistics and Interdisciplinary Perspectives on Language | CLIP (5)
Place of publication:Tübingen
Editor:Jost Gippert, Ralf Gehrke
Document Type:Part of a Book
Year of first Publication:2015
Date of Publication (online):2024/04/03
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
GND Keyword:Deutsch; Historische Sprachwissenschaft; Korpus <Linguistik>
First Page:181
Last Page:196
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Licence (German):License LogoUrheberrechtlich geschützt