Refine
Year of publication
Document Type
- Article (50)
- Conference Proceeding (25)
- Part of a Book (14)
- Review (4)
- Part of Periodical (1)
Has Fulltext
- yes (94) (remove)
Keywords
- Gesprochene Sprache (94) (remove)
Publicationstate
- Veröffentlichungsversion (61)
- Zweitveröffentlichung (27)
- Postprint (5)
Reviewstate
- Peer-Review (94) (remove)
Publisher
- Verlag für Gesprächsforschung (9)
- de Gruyter (7)
- Association for Computational Linguistics (5)
- Erich Schmidt (5)
- European Language Resources Association (4)
- European Language Resources Association (ELRA) (3)
- Lexical Computing CZ s.r.o. (3)
- Linköping University Electronic Press (3)
- Springer (3)
- Benjamins (2)
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
Smooth turn-taking in conversation depends in part on speakers being able to communicate their intention to hold or cede the floor. Both prosodic and gestural cues have been shown to be used in this context. We investigate the interplay of pitch movements and hand gestures at locations at which speaker change becomes relevant, comparing their use in German and Swedish. We find that there are some shared functions of prosody and gesture with regard to turn-taking in the two languages, but that these shared functions appear to be mediated by the different phonological demands on pitch in the two languages.
We present a descriptive analysis on the two datasets from the shared task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS), the only existing German dataset for opinion role extraction of its size. Our analysis discusses the individual properties of the three components, subjective expressions, sources and targets and their relations towards each other. Our observations should help practitioners and researchers when building a system to extract opinion roles from German data.
A syntax-based scheme for the annotation and segmentation of German spoken language interactions
(2018)
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segmenting have been proposed,
some of which are based on syntax. In this study, we developed and evaluated annotation and segmentation guidelines in reference to the topological field model for German. We can show that these guidelines are used consistently across annotators. We also investigated the influence of various interactional settings with a rather simple measure, the word-count per segment and unit-type. We observed that the word count and the distribution of each unit type differ in varying interactional settings and that our developed segmentation and annotation guidelines are used consistently across annotators. In conclusion, our syntax-based segmentations reflect interactional properties that are intrinsic to the social interactions that participants are involved in. This can be used for further analysis of social interaction and opens the possibility for automatic segmentation of transcripts.
Online Access Tools for Spoken German: The Resources of the Deutsches Spracharchiv in a Database
(2002)
This paper shows some details of the modernization of the Deutsches Spracharchiv (DSAv). It explores some future possibilities of linguistical documentation and analysis using the Web. The Institut für Deutsche Sprache (IDS) in Mannheim is the central institution for linguistic research in Germany. The DSAv in the IDS is the center for documentation and research of spoken German. These archives include the largest collection of sound recordings of spoken German (dialects and colloquial speech, including e.g. lots of extinct dialects of former German territories in Eastern Europe) - altogether more than 15,000 sound recordings. The lacking clarification and accessibility of this data material has been felt as an essential deficit. The opportunity to edit the sound signal digitally offers a much easier access to spoken language. Through the integration of the already existing information about the corpora and the transcribed texts in an information- and full text databank, as well as the linking of the data with the acoustic signal (alignment), arises a data-pool with considerably better documentation of the materials and a fast direct grasp of the recorded sounds. Thus, the DSAv initiates totally new research questions for the work at the IDS, as well as for linguistics altogether.
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.