Refine
Document Type
- Article (1)
- Part of a Book (1)
- Conference Proceeding (1)
Language
- English (3)
Is part of the Bibliography
- yes (3)
Keywords
- ASR (2)
- Automatische Spracherkennung (2)
- Gesprochene Sprache (2)
- Korpus <Linguistik> (2)
- Plurizentrische Sprache (2)
- Ripuarian (2)
- Sprachgeografie (2)
- automatic transcription (2)
- corpus curation (2)
- oral corpora (2)
Publicationstate
- Zweitveröffentlichung (3) (remove)
Reviewstate
- Peer-Review (3)
Publisher
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
To date, little is known about prosodic accommodation and its conversational functions in instances of overlapping talk in conversation. A major conversational action that happens in overlap is turn competition. It is not known whether participants accommodate prosodic parameters locally in the overlapped turn (initialisation) or access a repertoire of prosodic patterns that refer to general prosodic parameter norms (normalisation) when competing for the turn in overlap. This paper investigates the initialisation and normalisation of fundamental frequency (f0) and assesses its role as a resource for turn competition in overlap. We drew instances of overlapping talk from a corpus of conversational multi-party interactions in British English. We annotated the overlaps on a competitiveness scale and categorised them by overlap onset position and conversational function. We automatically extracted f0 parameters from the speech signal and processed them into f0 accommodation features that represent the normalising or the initialising use of f0. Using decision tree classification we found that f0 accommodation is only relevant as a turn competitive resource in overlaps that start clearly before a speaker transition. In this turn context, we found that normalising and initialising f0 features can both be relevant turn competitive resources. Their deployment depends on the conversational function of overlap.