Refine
Year of publication
Document Type
- Conference Proceeding (11)
- Part of a Book (5)
- Article (4)
- Doctoral Thesis (1)
- Preprint (1)
Keywords
- Gesprochene Sprache (8)
- Korpus <Linguistik> (8)
- Prosodie (7)
- Konversationsanalyse (6)
- Deutsch (5)
- Annotation (3)
- Gestik (3)
- Prosody (3)
- Sprecherwechsel (3)
- ASR (2)
Publicationstate
- Veröffentlichungsversion (11)
- Postprint (3)
- Zweitveröffentlichung (3)
Reviewstate
Publisher
- Association for Computational Linguistics (2)
- European Language Resources Association (2)
- Association for Computational Linguistics ( ACL ); Curran Associates, Inc. (1)
- Elsevier (1)
- European Language Resources Association (ELRA) (1)
- IEEE (1)
- IPrA (International Pragmatics Association) (1)
- John Benjamins (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Paderborn University (1)
A syntax-based scheme for the annotation and segmentation of German spoken language interactions
(2018)
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segmenting have been proposed,
some of which are based on syntax. In this study, we developed and evaluated annotation and segmentation guidelines in reference to the topological field model for German. We can show that these guidelines are used consistently across annotators. We also investigated the influence of various interactional settings with a rather simple measure, the word-count per segment and unit-type. We observed that the word count and the distribution of each unit type differ in varying interactional settings and that our developed segmentation and annotation guidelines are used consistently across annotators. In conclusion, our syntax-based segmentations reflect interactional properties that are intrinsic to the social interactions that participants are involved in. This can be used for further analysis of social interaction and opens the possibility for automatic segmentation of transcripts.
Looking at gestures as a means for communication, they can serve conversational participants at several levels. As co-speech gestures, they can add information to the verbally expressed content and they can serve to manage turn-taking. In order to look closer at the interplay between these resources in face-to face conversation, we annotated hand gestures, syntactic completion points and the related turn-organisation, and measured the timing of gesture strokes and their lexical/phrasal referent. In a case study on German, we observe the trend that speakers vary less in gesturelexis on- and offsets when keeping the turn after syntactic completions than at speaker changes, backchannel or other locations of a conversation. This indicates that timing properties of non-verbal cues interact with verbal cues to manage turn-taking.
Understanding the design of talk-in-interaction is important in many domains, including speech technology. Although phonetic, linguistic and gestural correlates have been identified for some of the social actions that conversational participants accomplish, it is only recently that researchers have begun to take account of the immediately prior interactional context as an important factor influencing the design of a speaker’s turn. The present study explores the influence of context by focussing on characteristics of short turns produced by one speaker between turns from another speaker. The hypothesis is that the speaker designs her inserted turn as a match to the prior turn when wishing to align with the previous speaker’s agenda. By contrast, non-matching would display that the speaker is non-aligning, preferring instead to initiate a new action for example. Data are taken from the AMI corpus, focussing on the spontaneous talk of first-language English participants. Using sequential analysis, such short turns are classified as either aligning or non-aligning in accordance with definitions in the Conversation Analysis literature. The degree of prosodic similarity between the inserted turn and the prior speaker’s turn is measured using novel acoustic techniques. The results show that aligning turns are significantly more similar to the immediately preceding turn, in terms of pitch contour, than non-aligning turns. In contrast to the prosodic-acoustic analysis, the results of the gestural analysis indicate that aligning and non-aligning are differentiated by the use of distinct gestures, rather than by the matching (or non-matching) of gestures across the adjacent turns. These results support the view that choice of pitch contour is managed locally, rather than by reference to an intonational lexicon. However, this is not the case for speakers’ use of gesture. The implications of these findings for a model of talk-in-interaction are considered, along with potential applications.
A trainable prosodic model called SFC (Superposition of Functional Contours), proposed by Holm and Bailly, is here confronted to German intonation. Training material is the publicly available Siemens Synthesis Corpus that provides spoken utterances for high-quality speech synthesis. We describe the labeling framework and first evaluation results that compares the original prosody of test sentences of this corpus with their prosodic rendering by the proposed model and state-of-the-art systems available on-line on the web.
In order to explore the influence of context on the phonetic design of talk-in-interaction, we investigated the pitch characteristics of short turns (insertions) that are produced by one speaker between turns from another speaker. We investigated the hypothesis that the speaker of the insertion designs her turn as a pitch match to the prior turn in order to align with the previous speaker’s agenda, whereas non-matching displays that the speaker of the insertion is non-aligning, for example to initiate a new action. Data were taken from the AMI meeting corpus, focusing on the spontaneous talk of first-language English participants. Using sequential analysis, 177 insertions were classified as either aligning or non-aligning in accordance with definitions of these terms in the Conversation Analysis literature. The degree of similarity between the pitch contour of the insertion and that of the prior speaker’s turn was measured, using a new technique that integrates normalized F0 and intensity information. The results showed that aligning insertions were significantly more similar to the immediately preceding turn, in terms of pitch contour, than were non-aligning insertions. This supports the view that choice of pitch contour is managed locally, rather than by reference to an intonational lexicon.
To date, little is known about prosodic accommodation and its conversational functions in instances of overlapping talk in conversation. A major conversational action that happens in overlap is turn competition. It is not known whether participants accommodate prosodic parameters locally in the overlapped turn (initialisation) or access a repertoire of prosodic patterns that refer to general prosodic parameter norms (normalisation) when competing for the turn in overlap. This paper investigates the initialisation and normalisation of fundamental frequency (f0) and assesses its role as a resource for turn competition in overlap. We drew instances of overlapping talk from a corpus of conversational multi-party interactions in British English. We annotated the overlaps on a competitiveness scale and categorised them by overlap onset position and conversational function. We automatically extracted f0 parameters from the speech signal and processed them into f0 accommodation features that represent the normalising or the initialising use of f0. Using decision tree classification we found that f0 accommodation is only relevant as a turn competitive resource in overlaps that start clearly before a speaker transition. In this turn context, we found that normalising and initialising f0 features can both be relevant turn competitive resources. Their deployment depends on the conversational function of overlap.
For many reasons, Mennonite Low German is a language whose documentation and investigation is of great importance for linguistics. To date, most research projects that deal with this language and/ or its speakers have had a relatively narrow focus, with many of the data cited being of limited relevance beyond the projects for which they were collected. In order to create a resource for a broad range of researchers, especially those working on Mennonite Low German, the dataset presented here has been transformed into a structured and searchable corpus that is accessible online. The translations of 46 English, Spanish, or Portuguese stimulus sentences into Mennonite Low German by 321 consultants form the core of the MEND-corpus (Mennonite Low German in North and South America) in the Archive for Spoken German. In addition to describing the origin of this corpus and discussing possibilities and limitations for further research, we discuss the technical structure and search possibilities of the Database for Spoken German. Among other things, this database allows for a structured search of metadata, a context-sensitive token search, and the generation of virtual corpora that can be shared with others. Moreover, thanks to its text-sound alignment, one can easily switch from a particular text section of the corpus to the corresponding audio section. Aside from the desire to equip the reader with the technical knowledge necessary to use this corpus, a further goal of this paper is to demonstrate that the corpus still offers many possibilities for future research.
This paper examines multi-unit turns that allow speakers to retrospectively close the prior sequence while prospectively launching a new sequence, which Schegloff (1986) referred to as interlocking organization. Using English telephone conversations as data, we focus on how multi-unit turns are used for topic shifts, and show that interlocking organization operates in conjunction with other phonetic and lexical features, such as increased pitch and overt markers of disjunction (e.g., “listen”). In addition, speakers utilize an audible inbreath that is placed between the first and the second units as a central interactional resource to project further talk, thereby suppressing speaker transition and possibly highlighting the action delivered in the second unit as being distinctly new. We propose that interlocking multi-unit turns, when used to make topically disjunctive moves, promote progressivity by avoiding a possible lapse in turn transition