Volltext-Downloads (blau) und Frontdoor-Views (grau)

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

  • This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Thomas Schmidt
URN:urn:nbn:de:bsz:mh39-62156
URL:http://www.jlcl.org/2016_Heft1/jlcl-2016-1-7Schmidt.pdf
ISSN:2190-6858
Parent Title (English):Journal for language technology and computational linguistics (JLCL)
Place of publication:Berlin
Editor:Marc Kupietz, Alexander Geyken
Document Type:Article
Language:English
Year of first Publication:2016
Date of Publication (online):2017/06/22
Contributing Corporation:Gesellschaft für Sprachtechnologie und Computerlinguistik (GSCL)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
GND Keyword:Datenbank; Deutsch; Gesprochene Sprache; Korpus <Linguistik>
Volume:31
Issue:1
Pagenumber:28
First Page:127
Last Page:154
Dewey Decimal Classification:400 Sprache / 430 Deutsch
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG