Refine
Year of publication
Document Type
- Conference Proceeding (11)
- Part of a Book (5)
- Article (4)
- Doctoral Thesis (1)
- Preprint (1)
Keywords
- Gesprochene Sprache (8)
- Korpus <Linguistik> (8)
- Prosodie (7)
- Konversationsanalyse (6)
- Deutsch (5)
- Annotation (3)
- Gestik (3)
- Prosody (3)
- Sprecherwechsel (3)
- ASR (2)
Publicationstate
- Veröffentlichungsversion (11)
- Postprint (3)
- Zweitveröffentlichung (3)
Reviewstate
Publisher
- Association for Computational Linguistics (2)
- European Language Resources Association (2)
- Association for Computational Linguistics ( ACL ); Curran Associates, Inc. (1)
- Elsevier (1)
- European Language Resources Association (ELRA) (1)
- IEEE (1)
- IPrA (International Pragmatics Association) (1)
- John Benjamins (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Paderborn University (1)
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
Looking at gestures as a means for communication, they can serve conversational participants at several levels. As co-speech gestures, they can add information to the verbally expressed content and they can serve to manage turn-taking. In order to look closer at the interplay between these resources in face-to face conversation, we annotated hand gestures, syntactic completion points and the related turn-organisation, and measured the timing of gesture strokes and their lexical/phrasal referent. In a case study on German, we observe the trend that speakers vary less in gesturelexis on- and offsets when keeping the turn after syntactic completions than at speaker changes, backchannel or other locations of a conversation. This indicates that timing properties of non-verbal cues interact with verbal cues to manage turn-taking.
This paper examines multi-unit turns that allow speakers to retrospectively close the prior sequence while prospectively launching a new sequence, which Schegloff (1986) referred to as interlocking organization. Using English telephone conversations as data, we focus on how multi-unit turns are used for topic shifts, and show that interlocking organization operates in conjunction with other phonetic and lexical features, such as increased pitch and overt markers of disjunction (e.g., “listen”). In addition, speakers utilize an audible inbreath that is placed between the first and the second units as a central interactional resource to project further talk, thereby suppressing speaker transition and possibly highlighting the action delivered in the second unit as being distinctly new. We propose that interlocking multi-unit turns, when used to make topically disjunctive moves, promote progressivity by avoiding a possible lapse in turn transition
Prosodic constructions used to compete for the speaking turn in conversation have been widely studied (French & Local (1983), Kurtić et al. (2013)). Usually, turn competition arises in overlapping talk between at least two speakers. Coordination between participants in their prosodic design of talk (Szczepek-Reed, 2006) and social action (Gorisch et al. 2012), as well as entrainment in more general terms (Levitan et al. 2011), is well established in the literature. Nevertheless, previous studies on turn competition and overlap do not investigate the prosodic design of turn competitive incomings in reference to the orientation of the speakers to each other. Rather, they assume that prosodic constructions are used for turn competition regardless of the co-participants’ design of the turn. In this paper, we ask whether the prosodic design of turn competitive talk is co-constructed between two participants talking in overlap. More specifically, we investigate whether the prosodic design of one participant’s in overlap talk is developed with respect to the interlocutor’s prosodic features during the same portion of overlapped talk, and whether this prosodic matching can discriminate between the overlaps that are competitive and those that are not. 183 Our analyses are based on two-speaker overlaps drawn from a corpus of multi-party face-to face conversation between four friends recorded in British English (Kurtic et al. 2012). 3407 instances of twospeaker overlaps have been extracted from 4 hours of talk. Two independent conversation analysts performed the interactional categorisation of overlaps into competitive and non-competitive for all these two-speaker overlap instances and achieved a good agreement of alpha=0.807 (Krippendorff 2004) as measured on a subset of 808 overlaps selected for our initial analysis. For the analysis of prosodic features we focus on F0 related features: mean, slope, span and contour, all of which have previously been shown to be used by each overlapping speaker separately for turn competition (Kurtic et al. 2009; Oertel et al. 2012). We investigate the similarity in F0 mean, slope and span by correlating these features across the two participants. For F0 contour, a similarity coefficient is computed using dynamic programming method described in Gorisch et al. (2012). We consider the difference in F0 contour similarity in competitive and non-competitive overlaps as an indication of intonational matching being a turn competitive resource. We conduct these analyses for overlaps that are clearly competitive or noncompetitive as indicated by inter-annotator agreement. In addition, we qualitatively explore those cases that annotators disagree on in order to investigate whether they reveal further important interactional or prosodic features of in-overlap talk. Our preliminary results suggest that conversational participants attend and adapt to the interlocutor during overlap depending on whether they return competition or not. We explain our findings in relation to previous work on turn competition in overlap, discuss the quantitative method employed and also address the possible consequences of our results for the study of prosodic realization of other social actions in conversation.
In order to explore the influence of context on the phonetic design of talk-in-interaction, we investigated the pitch characteristics of short turns (insertions) that are produced by one speaker between turns from another speaker. We investigated the hypothesis that the speaker of the insertion designs her turn as a pitch match to the prior turn in order to align with the previous speaker’s agenda, whereas non-matching displays that the speaker of the insertion is non-aligning, for example to initiate a new action. Data were taken from the AMI meeting corpus, focusing on the spontaneous talk of first-language English participants. Using sequential analysis, 177 insertions were classified as either aligning or non-aligning in accordance with definitions of these terms in the Conversation Analysis literature. The degree of similarity between the pitch contour of the insertion and that of the prior speaker’s turn was measured, using a new technique that integrates normalized F0 and intensity information. The results showed that aligning insertions were significantly more similar to the immediately preceding turn, in terms of pitch contour, than were non-aligning insertions. This supports the view that choice of pitch contour is managed locally, rather than by reference to an intonational lexicon.
Understanding the design of talk-in-interaction is important in many domains, including speech technology. Although phonetic, linguistic and gestural correlates have been identified for some of the social actions that conversational participants accomplish, it is only recently that researchers have begun to take account of the immediately prior interactional context as an important factor influencing the design of a speaker’s turn. The present study explores the influence of context by focussing on characteristics of short turns produced by one speaker between turns from another speaker. The hypothesis is that the speaker designs her inserted turn as a match to the prior turn when wishing to align with the previous speaker’s agenda. By contrast, non-matching would display that the speaker is non-aligning, preferring instead to initiate a new action for example. Data are taken from the AMI corpus, focussing on the spontaneous talk of first-language English participants. Using sequential analysis, such short turns are classified as either aligning or non-aligning in accordance with definitions in the Conversation Analysis literature. The degree of prosodic similarity between the inserted turn and the prior speaker’s turn is measured using novel acoustic techniques. The results show that aligning turns are significantly more similar to the immediately preceding turn, in terms of pitch contour, than non-aligning turns. In contrast to the prosodic-acoustic analysis, the results of the gestural analysis indicate that aligning and non-aligning are differentiated by the use of distinct gestures, rather than by the matching (or non-matching) of gestures across the adjacent turns. These results support the view that choice of pitch contour is managed locally, rather than by reference to an intonational lexicon. However, this is not the case for speakers’ use of gesture. The implications of these findings for a model of talk-in-interaction are considered, along with potential applications.