Volltext-Downloads (blau) und Frontdoor-Views (grau)

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

  • In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Swantje Westpfahl, Thomas SchmidtORCiDGND
Parent Title (English):Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Document Type:Conference Proceeding
Year of first Publication:2016
Date of Publication (online):2016/07/19
Tag:GOLD standard; German spoken language; Part-of-Speech-Tagging = POS
GND Keyword:Deutsch; Gesprochene Sprache; Korpus <Linguistik>
First Page:1493
Last Page:1499
DDC classes:400 Sprache / 430 Deutsch
Open Access?:ja
BDSL-Classification:Sprache im 20. Jahrhundert. Gegenwartssprache
Leibniz-Classification:Sprache, Linguistik
Licence (English):License LogoCreative Commons - Attribution-NonCommercial 4.0 International