Volltext-Downloads (blau) und Frontdoor-Views (grau)

Standardising language data through the conversion pipeline TEIWorLD

  • The conversion of data into a standard format is a crucial step in many research workflows. Standardisation enables data exchange, reuse, and analysis, which are essential for advancing knowledge in various fields. In this publication, we describe the conversion pipeline TEIWorLD (TEI Workflow for Language Data) that transforms written and spoken language data into standardised formats, specifically I5/TEI P5 XML for written data and ISO/TEI Transcriptions of Spoken Language for spoken data. The pipeline leverages existing tools to convert specific formats into these standards, with an additional transformation step for written data into the archival I5 (short for IDS TEI P5) format used at the Leibniz Institute for the German Language (IDS). We also present two use cases that demonstrate the practical application of standardisation with our conversion pipeline TEIWorLD in language data management on a corpus consisting of more than one format, enabling researchers to efficiently analyse and share their data.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Jennifer EckerORCiDGND
URN:urn:nbn:de:bsz:mh39-136778
DOI:https://doi.org/10.21248/idsopen.15.2026.54
ISBN:978-3-948831-78-3
ISSN:2749-9855
Series (Serial Number):IDSopen: Online-only Publikationen des Leibniz-Instituts für Deutsche Sprache (15)
Publisher:IDS-Verlag
Place of publication:Mannheim
Editor:Norman FiedlerGND, Katrin Hein-AntonioliGND, Siegwalt LindenfelserORCiDGND, Beata TrawińskiORCiDGND
Document Type:Book
Language:German
Year of first Publication:2026
Date of Publication (online):2026/03/03
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:(Verlags)-Lektorat
Tag:Datenumwandlung; IDS TEI P5; Schlagwortkonversion; Schlagwortumwandlung
Data conversion; Keywords conversion; TEIWorLD
GND Keyword:Datenkonvertierung; Gesprochene Sprache; Leibniz-Institut für Deutsche Sprache (IDS); Pipeline-Verarbeitung; Schriftsprache
Page Number:17
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Program areas:Digitale Sprachwissenschaft
Licence (German):License LogoCreative Commons - Namensnennung-Weitergabe unter gleichen Bedingungen 3.0 Deutschland