Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German
- This article discusses questions concerning the creation, annotation and sharing of spoken language corpora. We use the Hamburg Map Task Corpus (HAMATAC), a small corpus in which advanced learners of German were recorded solving a map task, as an example to illustrate our main points. We first give an overview of the corpus creation and annotation process including recording, metadata documentation, transcription and semi-automatic annotation of the data. We then discuss the manual annotation of disfluencies as an example case in which many of the typical and challenging problems for data reuse – in particular the reliability of interpretative annotations – are revealed.
Author: | Hanna HedelandORCiD, Thomas SchmidtORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-97229 |
DOI: | https://doi.org/10.1075/hsm.14.04hed |
ISBN: | 9789027219343 |
Parent Title (English): | Multilingual Corpora and Multilingual Corpus Analysis |
Series (Serial Number): | Hamburg Studies on Multilingualism (14) |
Publisher: | Benjamins |
Place of publication: | Amsterdam |
Editor: | Thomas Schmidt, Kai Wörner |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2012 |
Date of Publication (online): | 2020/03/22 |
Publicationstate: | Postprint |
Reviewstate: | (Verlags)-Lektorat |
GND Keyword: | Annotation; Gesprochene Sprache; Korpus <Linguistik>; Transkription |
First Page: | 25 |
Last Page: | 46 |
Note: | This is a postprint of an article that was published in the book "Multilingual Corpora and Multilingual Corpus Analysis / ed. by Thomas Schmidt ; Kai Wörner. - Amsterdam ; Philadelphia : Benjamins, 2012. - XIII, 406 S. : Ill., graph. Darst. . - (Hamburg studies on multilingualism ; 14.). DOI: https://doi.org/10.1075/lic.14.2.09zel". The published article is under copyright of Benjamins. The publisher should be contacted for permission to re-use or reprint the material in any form. |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | Urheberrechtlich geschützt |