Refine
Year of publication
Document Type
- Part of a Book (70)
- Article (48)
- Conference Proceeding (34)
- Working Paper (12)
- Book (7)
- Review (4)
- Part of Periodical (1)
Has Fulltext
- yes (176)
Keywords
- Gesprochene Sprache (176) (remove)
Publicationstate
- Veröffentlichungsversion (176) (remove)
Reviewstate
- (Verlags)-Lektorat (99)
- Peer-Review (61)
- Review-Status-unbekannt (2)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
- Verlag für Gesprächsforschung (16)
- de Gruyter (15)
- Institut für Deutsche Sprache (12)
- Narr (11)
- Leibniz-Institut für Deutsche Sprache (IDS) (9)
- European Language Resources Association (ELRA) (8)
- Association for Computational Linguistics (5)
- European Language Resources Association (5)
- Lang (5)
- Leibniz-Institut für Deutsche Sprache (3)
Post-field syntax and focalization strategies in National Socialist political speech. This paper deals with a syntactic feature of spoken German, i.e. post-field filling, and with its occurrence in one specific discourse type – political speech – throughout one significant period of the history of German language – National Socialism. This paper aims at pointing out the communicative pragmatic function of right dislocation in the NS political speech on the basis of some collected examples.
Der Auftaktworkshop "Lexik des gesprochenen Deutsch: Forschungsstand, Erwartungen und Anforderungen an die Entwicklung einer innovativen lexikografischen Ressource" fand am 16. und 17. Februar 2017 am Institut fur Deutsche Sprache (IDS) in Mannheim statt. Das von der Leibniz-Gemeinschaft geforderte Projekt "Lexik des gesprochenen Deutsch" (=LeGeDe, Leibniz-Wettbewerb 2016, Forderlinie "Innovative Vorhaben") nahm im September 2016 am IDS seine Arbeit auf. Das Hauptziel ist die Erstellung einer korpusbasierten elektronischen Ressource zur Lexik des gesprochenen Deutsch auf der Grundlage von lexikologischen und gesprachsanalytischen Untersuchungen authentischer gesprochensprachlicher Daten.
In this paper, we describe a data processing pipeline used for annotated spoken corpora of Uralic languages created in the INEL (Indigenous Northern Eurasian Languages) project. With this processing pipeline we convert the data into a loss-less standard format (ISO/TEI) for long-term preservation while simultaneously enabling a powerful search in this version of the data. For each corpus, the input we are working with is a set of files in EXMARaLDA XML format, which contain transcriptions, multimedia alignment, morpheme segmentation and other kinds of annotation. The first step of processing is the conversion of the data into a certain subset of TEI following the ISO standard ’Transcription of spoken language’ with the help of an XSL transformation. The primary purpose of this step is to obtain a representation of our data in a standard format, which will ensure its long-term accessibility. The second step is the conversion of the ISO/TEI files to a JSON format used by the “Tsakorpus” search platform. This step allows us to make the corpora available through a web-based search interface. As an addition, the existence of such a converter allows other spoken corpora with ISO/TEI annotation to be made accessible online in the future.
We present a method for detecting and reconstructing separated particle verbs in a corpus of spoken German by following an approach suggested for written language. Our study shows that the method can be applied successfully to spoken language, compares different ways of dealing with structures that are specific to spoken language corpora, analyses some remaining problems, and discusses ways of optimising precision or recall for the method. The outlook sketches some possibilities for further work in related areas.
We present the annotation of information structure in the MULI project. To learn more about the information structuring means in prosody, syntax and discourse, theory- independent features were defined for each level. We describe the features and illustrate them on an example sentence. To investigate the interplay of features, the representation has to allow for inspecting all three layers at the same time. This is realised by a stand-off XML mark-up with the word as the basic unit. The theory-neutral XML stand-off annotation allows integrating this resource with other linguistic resources such as the Tiger Treebank for German or the Penn treebank for English.
Vorschlag zu einer Typik der Kommunikationssituationen in der gesprochenen deutschen Standardsprache
(1975)
In diesem Aufsatz werden Diskursmarker als Operatoren definiert, die Skopus über Sprechakte nehmen, d.h. Sprechakte modifizieren oder miteinander verknüpfen. Als Sprechakte in diesem Sinne kommen neben perlokutionären und illokutionären auch lokutionäre Akte in Betracht. Die Operation eines Diskursmarkers wird als Zuordnung thematischer Rollen konzeptualisiert. Dafür muss der Diskursmarker zu seinem Operanden im syntaktischen Verhältnis eines Kopfes zu seinem Komplement oder eines Adjunktes zu seinem Wirt stehen, oder er muss ein syntaktisch unabhängiger referentieller Ausdruck sein, der seinen Operanden als Verweisziel nimmt. Linear stehen Diskursmarker typischerweise peripher zu ihren Operanden. In satzförmigen Operanden können adverbiale Diskursmarker auch Binnenstellungen einnehmen.
Sogenannte „Pragmatikalisierte Mehrworteinheiten“ sind im Deutschen hochfrequent und unterliegen bisweilen tiefgreifenden phonetischen Reduktionsprozessen. Diese können Realisierungsvarianten hervorbringen, die in der Rückschau auf mehr als eine lexematische Ursprungsform zurückführbar sind. Die vorliegende Studie untersucht mit [ˈzɐmɐ] einen besonders prägnanten Fall dieser Art anhand eines Perzeptionsexperimentes.
While written corpora can be exploited without any linguistic annotations, speech corpora need at least a basic transcription to be of any use for linguistic research. The basic annotation of speech data usually consists of time-aligned orthographic transcriptions. To answer phonetic or phonological research questions, phonetic transcriptions are needed as well. However, manual annotation is very time-consuming and requires considerable skill and near-native competence. Therefore it can take years of speech corpus compilation and annotation before any analyses can be carried out. In this paper, approaches that address the transcription bottleneck of speech corpus exploitation are presented and discussed, including crowdsourcing the orthographic transcription, automatic phonetic alignment, and query-driven annotation. Currently, query-driven annotation and automatic phonetic alignment are being combined and applied in two speech research projects at the Institut für Deutsche Sprache (IDS), whereas crowdsourcing the orthographic transcription still awaits implementation.
We present an XML-based metadata standard for the documentation of speech and multimedia corpora that was developed at the Institute for German Language (IDS) in Mannheim, Germany. The IDS is one of the major institutions providing German speech and language corpora to researchers. These corpora stem from many different sources and were previously documented in a rather heterogeneous fashion using a variety of data models and formats. In order to unify the documentation for existing and future corpora, the IDS- internal Archive for Spoken German collaborated with several projects and developed a set of standardised XML metadata schemas. These XML schemas build on existing internal and external documentation schemas (such as IMDI) and take into account the workflow of speech corpus production. In order to minimise redundancy, separate schemas were designed for projects, speakers, recording sessions, and entire corpora. The resulting schemas are tested in ongoing speech and multi-media projects at the IDS and are regularly revised. They are accompanied by element definitions, guidelines, and examples. In addition, a mapping to IMDI will be provided.
The research project “German Today” aims to determine the amount of regional variation in (near-)standard German spoken by young and older educated adults and to identify and locate regional features. To this end, we compile an areally extensive corpus of read and spontaneous German speech. Secondary school students and 50-to-60-year-old locals are recorded in 160 cities throughout the German speaking area of Europe. All participants read a number of short texts and a word list, name pictures, translate words and sentences from English, answer questions in a sociobiographic interview, and take part in a map task experiment. The resulting corpus comprises over 1000 hours of speech, which is transcribed orthographically. Automatically derived broad phonetic transcriptions, selective manual narrow phonetic transcriptions, and variationalist annotations are added. Focussing on phonetic variation we aim to show to what extent national or regional standards exist in spoken German. Furthermore, the linguistic variation due to different contextual styles (read vs. spontaneous speech) shall be analysed. Finally, the corpus enables us to investigate whether linguistic change has occurred in spoken (near-)standard German.
Der Beitrag plädiert für eine Untersuchung der gesprochenen Sprache als integralem Bestandteil multimodaler Interaktionspraktiken. Das leibliche Handeln bildet die Infrastruktur für die Verwendung von Sprache, es schafft Bedingungen, Möglichkeiten und Motivationen für die Verwendung spezifischer sprachlicher Strukturen; umgekehrt wird es seinerseits durch sprachliches Handeln organisiert. Zunächst werden in dem Beitrag grundlegende Eigenschaften multimodaler Interaktion dargestellt: die Vielfalt der leiblichen Handlungsressourcen und ihre Koordination, Sequenzialität und Simultaneität von Aktivitäten, multimodale Beteiligung an der Interaktion, der Stellenwert von Raum, Objekten, Multiaktivität und Bewegung. Ebenso wird kurz auf die methodischen Grundlagen der Untersuchung eingegangen: Videoaufnahme und multimodale Transkription. An drei sprachlichen Phänomenbereichen wird dann exemplarisch gezeigt, wie sprachliche Praktiken durch ihr Zusammenspiel mit anderen leiblichen Ressourcen der Kommunikation geprägt sind. Im Einzelnen geht es um die Disambiguierung sprachlicher Praktiken durch ihre Koordination mit anderen Ressourcen, die Erweiterung sprachlicher Strukturen, die aufgrund von Rezipientenreaktionen simultan zur Turn-Produktion stattfindet, und die Verwendungen minimaler Referenzformen, die sich auf die multimodale Ko-Orientierung der Beteiligten stützt.
Ph@ttSessionz and Deutsch heute are two large German speech databases. They were created for different purposes: Ph@ttSessionz to test Internet-based recordings and to adapt speech recognizers to the voices of adolescent speakers, Deutsch heute to document regional variation of German. The databases differ in their recording technique, the selection of recording locations and speakers, elicitation mode, and data processing.
In this paper, we outline how the recordings were performed, how the data was processed and annotated, and how the two databases were imported into a single relational database system. We present acoustical measurements on the digit items of both databases. Our results confirm that the elicitation technique affects the speech produced, that f0 is quite comparable despite different recording procedures, and that large speech technology databases with suitable metadata may well be used for the analysis of regional variation of speech.