Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project
- This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.
Author: | Thomas SchmidtORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-62156 |
URL: | http://www.jlcl.org/2016_Heft1/jlcl-2016-1-7Schmidt.pdf |
ISSN: | 2190-6858 |
Parent Title (English): | Journal for language technology and computational linguistics (JLCL) |
Place of publication: | Berlin |
Editor: | Marc Kupietz, Alexander Geyken |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2016 |
Date of Publication (online): | 2017/06/22 |
Contributing Corporation: | Gesellschaft für Sprachtechnologie und Computerlinguistik (GSCL) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
GND Keyword: | Datenbank; Deutsch; Gesprochene Sprache; Korpus <Linguistik> |
Volume: | 31 |
Issue: | 1 |
Page Number: | 28 |
First Page: | 127 |
Last Page: | 154 |
DDC classes: | 400 Sprache / 430 Deutsch |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Licence (German): | ![]() |