Refine
Document Type
- Conference Proceeding (5)
- Part of a Book (4)
- Article (1)
Is part of the Bibliography
- yes (10) (remove)
Keywords
- spoken German (10) (remove)
Publicationstate
Reviewstate
- Peer-Review (8)
- (Verlags)-Lektorat (1)
- Peer-review (1)
This presentation deals with collaborative turn-sequences (Lerner 2004), a syntactically coherent unit of talk that is jointly formulated by at least two speakers, in Czech and German everyday conversations. Based on conversation analysis (e.g., Schegloff 2007) and a multimodal approach to social interaction (e.g., Deppermann/Streeck 2018), we aim at comparing recurrent patterns and action types within co-constructional sequences in both languages. The practice of co-constructing turns-at-talk has been described for typologically different languages, especially for English (e.g., Lerner 1996, 2004), but also for languages such as Japanese (Hayashi 2003) or Finnish (Helasvuo 2004). For German, various forms and functions of co-constructions have already been investigated (e.g., Brenning 2015); for Czech, a detailed, interactionally based description is still pending (but see some initial observations in, e.g., Hoffmannová/Homoláč/Mrázková (eds.) 2019). Although the existence of co-constructions in different languages points to a cross-linguistic conversational practice, few explicitly comparative studies exist (see, e.g., Lerner/Takagi 1999, for English and Japanese). The language pair Czech-German has mainly been studied with respect to language contact and without specifically considering spoken language or complex conversational sequences (e.g., Nekula/Šichová/Valdrová 2013). Therefore, our second aim is to sketch out a first comparison of co-constructional sequences in German and Czech, thereby contributing to the growing field of comparative and cross-linguistic studies within conversation analysis (e.g., Betz et al. (eds.) 2021; Dingemanse/Enfield 2015; Sidnell (ed.) 2009). More specifically, we will present three main sequential patterns of co-constructional sequences, focusing on the type of action a second speaker carries out by completing a first speaker’s possibly incomplete turn-at-talk, and on how the initial speaker then responds to
this suggested completion (Lerner 2004). Excerpts from video recordings of Czech and German ordinary conversations will illustrate these recurrent co-constructional sequence types, i.e., offering help during word searches (see example 1 above), displaying understanding, or claiming independent knowledge. The third objective of this paper is to underline the participants’ orientation to similar interactional problems, solved by specific syntactic and/or lexical formats in Czech and German. Considering the more recent focus on the embodied dimension of co-constructional practices (e.g., Dressel 2020), we will also investigate the multimodal formatting of a started utterance as more or less “permeable” (Lerner 1996) for co-participant completion, the participants’ mutual embodied orientation, and possible embodied responses to others’ turn-completions (such as head nods or eyebrow flashes, cf. De Stefani 2021). More generally, this contribution reflects on the possibilities and challenges of a cross-linguistic comparison of complex multimodal sequences.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
Der Beitrag stellt die wissenschaftlichen und methodologischen Herausforderungen für die Erstellung einer innovativen, korpusbasierten lexikografischen Ressource zur Lexik des gesprochenen Deutsch in der Interaktion vor und zeigt neue Wege für lexikografische Arbeiten auf. Neben allgemeinen Projektinformationen zu den Ausgangspunkten, der Datengrundlage, den Methoden, Zielen und dem konkreten Gegenstandsbereich werden ausgewählte Ergebnisse von zwei projektbezogenen empirischen Studien zu Erwartungshaltungen an eine lexikografische Ressource des gesprochenen Deutsch präsentiert. Für korpusbasierte quantitative Informationen werden die Möglichkeiten eines Tools, welches im Rahmen des Projekts entwickelt wurde, aufgezeigt. Außerdem wird ein Einblick in die konzeptionellen und methodologischen Überlegungen zur Mikrostruktur der geplanten Ressource gegeben.
This paper investigates emergent pseudo-coordination in spoken German. In a corpus-based study, seven verbs in the first conjunct are analyzed regarding the degree of semantic bleaching and the development of subjective or aspectual meaning components. Moreover, it is shown that each verb shows distinct tendencies for co-ocurrences, especially with deictic adverbs in the first conjunct and with specific verbs and verb classes in the second conjunct. It is argued that pseudo-coordination is originally motivated by the need for ‘chunking’ in unplanned speech and that it is still prominently used in this function in German, in contrast to languages in which pseudo-coordination is grammaticalized further.
This paper presents the prototype of a lexicographic resource for spoken German in interaction, which was conceived within the framework of the LeGeDe-project (LeGeDe=Lexik des gesprochenen Deutsch). First of all, it summarizes the theoretical and methodological approaches that were used for the initial planning of the resource. The headword candidates were selected by analyzing corpus-based data. Therefore, the data of two corpora (written and spoken German) were compared with quantitative methods. The information that was gathered on the selected headword candidates can be assigned to two different sections: meanings and functions in interaction.
Additionally, two studies on the expectations of future users towards the resource were carried out. The results of these two studies were also taken into account in the development of the prototype. Focusing on the presentation of the resource’s content, the paper shows both the different lexicographical information in selected dictionary entries, and the information offered by the provided hyperlinks and external texts. As a conclusion, it summarizes the most important innovative aspects that were specifically developed for the implementation of such a resource.
Except for some recent advances in spoken language lexicography (cf. Verdonik & Sepesy Maučec 2017, Hansen & Hansen 2012, Siepmann 2015), traditional lexicographic work is mainly oriented towards the written language. In this paper, we describe a method we used to identify relevant headword candidates for a lexicographic resource for spoken language that is currently being developed at the Institute for the German Language (IDS, Mannheim). We describe the challenges of the headword selection for a dictionary of spoken language, and having made considerations regarding our headword concept, we present the corpus-based procedures that we used in order to facilitate the headword selection. After presenting the results regarding the selection of one-word lemmas, we discuss the opportunities and limitations of our approach.
This paper gives an insight into the basic concepts for a corpus-based lexical resource of spoken German, which is being developed by the project "The Lexicon of Spoken German"(Lexik des gesprochenen Deutsch, LeGeDe) at the "Institute for the German Language" (Institut für Deutsche Sprache, IDS) in Mannheim. The focus of the paper is on initial ideas of semi-automatic and automatic resources that assist the quantitative analysis of the corpus data for the creation of dictionary content. The work is based on the "Research and Teaching Corpus of Spoken German" (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).
This paper presents a short insight into a new project at the "Institute for the German Language” (IDS) (Mannheim). It gives an insight into some basic ideas for a corpus-based dictionary of spoken German, which will be developed and compiled by the new project "The Lexicon of spoken German” (Lexik des gesprochenen Deutsch, LeGeDe). The work is based on the "Research and Teaching Corpus of Spoken German” (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK), which is implemented in the "Database for Spoken German” (Datenbank für Gesprochenes Deutsch, DGD). Both resources, the database and the corpus, have been developed at the IDS.