Volltext-Downloads (blau) und Frontdoor-Views (grau)

Rescuing Legacy Data

  • This paper discusses issues that arise in the transformation of electronic language data from outdated to modern, sustainable formats. We first describe the problem and then present four different cases in which corpora of spoken language were converted from legacy formats to an XML-based representation. For each of the four cases, we describe the conversion workflow and discuss the difficulties that we had to overcome. Based on this experience, we formulate some more general observations about transforming legacy data and conclude with a set of best practice recommendations for a more sustainable handling of language corpora.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Thomas SchmidtORCiDGND, Jasmine Bennöhr
URN:urn:nbn:de:bsz:mh39-23160
URL:http://scholarspace.manoa.hawaii.edu/bitstream/handle/10125/1803/schmidtsmall.pdf;jsessionid=F7F234349A08D0D8F323455955EAAE6B?sequence=12
ISSN:1934-5275
Parent Title (English):Language Documentation and Conservation
Publisher:University of Hawaii Press
Place of publication:Honolulu
Document Type:Article
Language:English
Year of first Publication:2008
GND Keyword:Datenformat; Datenkonvertierung; Gesprochene Sprache; Korpus <Linguistik>
Volume:2
Issue:1
First Page:109
Last Page:120
DDC classes:400 Sprache / 410 Linguistik / 410 Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (German):License LogoCreative Commons - Namensnennung-Nicht kommerziell 3.0 Deutschland