Refine
Document Type
- Conference Proceeding (2)
- Working Paper (2)
- Article (1)
- Part of a Book (1)
Has Fulltext
- yes (6)
Keywords
- Gesprochene Sprache (5)
- Korpus <Linguistik> (5)
- Part-of-Speech-Tagging = POS (3)
- Deutsch (2)
- Syntax (2)
- Annotation (1)
- Automatische Spracherkennung (1)
- Benutzerforschung (1)
- Datenbank für gesprochenes Deutsch = DGD (1)
- Ellipse <Linguistik> (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (2)
- Peer-Review (1)
Die Guidelines sind eine Erweiterung des STTS (Schiller et al. 1999) für die Annotation von Transkripten gesprochener Sprache. Dieses Tagset basiert auf der Annotation des FOLK-Korpus des IDS Mannheim (Schmidt 2014) und es wurde gegenüber dem STTS erweitert in Hinblick auf typisch gesprochensprachliche Phänomene bzw. Eigenheiten der Transkription derselben. Es entstand im Rahmen des Dissertationsprojekts „POS für(s) FOLK – Entwicklung eines automatisierten Part-of-Speech-Tagging von spontansprachlichen Daten“ (Westpfahl 2017 (i.V.)).
We present a study on gaps in spoken language interaction as a potential candidate for syntactic boundaries. On the basis of an online annotation experiment, we can show that there is an effect of gap duration and gap type on its likelihood of being a syntactic boundary. We discuss the potential of these findings for an automation of the segmentation process.
This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts.
In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language.