Volltext-Downloads (blau) und Frontdoor-Views (grau)

Technological and methodological challenges in creating, annotating and sharing a learner corpus of spoken German

  • This article discusses questions concerning the creation, annotation and sharing of spoken language corpora. We use the Hamburg Map Task Corpus (HAMATAC), a small corpus in which advanced learners of German were recorded solving a map task, as an example to illustrate our main points. We first give an overview of the corpus creation and annotation process including recording, metadata documentation, transcription and semi-automatic annotation of the data. We then discuss the manual annotation of disfluencies as an example case in which many of the typical and challenging problems for data reuse – in particular the reliability of interpretative annotations – are revealed.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Hanna HedelandORCiD, Thomas SchmidtORCiDGND
URN:urn:nbn:de:bsz:mh39-97229
DOI:https://doi.org/10.1075/hsm.14.04hed
ISBN:9789027219343
Parent Title (English):Multilingual Corpora and Multilingual Corpus Analysis
Series (Serial Number):Hamburg Studies on Multilingualism (14)
Publisher:Benjamins
Place of publication:Amsterdam
Editor:Thomas Schmidt, Kai Wörner
Document Type:Part of a Book
Language:English
Year of first Publication:2012
Date of Publication (online):2020/03/22
Publicationstate:Postprint
Reviewstate:(Verlags)-Lektorat
GND Keyword:Annotation; Gesprochene Sprache; Korpus <Linguistik>; Transkription
First Page:25
Last Page:46
Note:
This is a postprint of an article that was published in the book "Multilingual Corpora and Multilingual Corpus Analysis / ed. by Thomas Schmidt ; Kai Wörner. - Amsterdam ; Philadelphia : Benjamins, 2012. - XIII, 406 S. : Ill., graph. Darst. . - (Hamburg studies on multilingualism ; 14.). DOI: https://doi.org/10.1075/lic.14.2.09zel".

The published article is under copyright of Benjamins. The publisher should be contacted for permission to re-use or reprint the material in any form.
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (German):License LogoUrheberrechtlich geschützt