OPUS 4 | Sprache im 20. Jahrhundert. Gegenwartssprache

Sprache im 20. Jahrhundert. Gegenwartssprache

2 search hits

1 to 2

Sort by

Transcription Bottleneck of Speech Corpus Exploitation (2009)

While written corpora can be exploited without any linguistic annotations, speech corpora need at least a basic transcription to be of any use for linguistic research. The basic annotation of speech data usually consists of time-aligned orthographic transcriptions. To answer phonetic or phonological research questions, phonetic transcriptions are needed as well. However, manual annotation is very time-consuming and requires considerable skill and near-native competence. Therefore it can take years of speech corpus compilation and annotation before any analyses can be carried out. In this paper, approaches that address the transcription bottleneck of speech corpus exploitation are presented and discussed, including crowdsourcing the orthographic transcription, automatic phonetic alignment, and query-driven annotation. Currently, query-driven annotation and automatic phonetic alignment are being combined and applied in two speech research projects at the Institut für Deutsche Sprache (IDS), whereas crowdsourcing the orthographic transcription still awaits implementation.

German Today: an areally extensive corpus of spoken Standard German (2008)

Brinckmann, Caren ; Kleiner, Stefan ; Knöbl, Ralf ; Berend, Nina

The research project “German Today” aims to determine the amount of regional variation in (near-)standard German spoken by young and older educated adults and to identify and locate regional features. To this end, we compile an areally extensive corpus of read and spontaneous German speech. Secondary school students and 50-to-60-year-old locals are recorded in 160 cities throughout the German speaking area of Europe. All participants read a number of short texts and a word list, name pictures, translate words and sentences from English, answer questions in a sociobiographic interview, and take part in a map task experiment. The resulting corpus comprises over 1000 hours of speech, which is transcribed orthographically. Automatically derived broad phonetic transcriptions, selective manual narrow phonetic transcriptions, and variationalist annotations are added. Focussing on phonetic variation we aim to show to what extent national or regional standards exist in spoken German. Furthermore, the linguistic variation due to different contextual styles (read vs. spontaneous speech) shall be analysed. Finally, the corpus enables us to investigate whether linguistic change has occurred in spoken (near-)standard German.

1 to 2

Open Access

Sprache im 20. Jahrhundert. Gegenwartssprache

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

2 search hits