Volltext-Downloads (blau) und Frontdoor-Views (grau)

A syntax-based scheme for the annotation and segmentation of German spoken language interactions

  • Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segmenting have been proposed, some of which are based on syntax. In this study, we developed and evaluated annotation and segmentation guidelines in reference to the topological field model for German. We can show that these guidelines are used consistently across annotators. We also investigated the influence of various interactional settings with a rather simple measure, the word-count per segment and unit-type. We observed that the word count and the distribution of each unit type differ in varying interactional settings and that our developed segmentation and annotation guidelines are used consistently across annotators. In conclusion, our syntax-based segmentations reflect interactional properties that are intrinsic to the social interactions that participants are involved in. This can be used for further analysis of social interaction and opens the possibility for automatic segmentation of transcripts.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Swantje Westpfahl, Jan Gorisch
URN:urn:nbn:de:bsz:mh39-79235
URL:http://aclweb.org/anthology/W18-4913
ISBN:978-1-948087-51-3
Parent Title (English):Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018). August 25-26, 2018 Santa Fe, New Mexico, USA
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg, PA, USA
Document Type:Part of a Book
Language:English
Year of first Publication:2018
Date of Publication (online):2018/09/19
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
GND Keyword:Annotation; Gesprochene Sprache; Korpus <Linguistik>; Segmentierung
First Page:109
Last Page:120
Dewey Decimal Classification:400 Sprache / 400 Sprache, Linguistik
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Gesprächsforschung / Gesprochene Sprache
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (English):License LogoCreative Commons - Attribution 4.0 International