Sprache im 20. Jahrhundert. Gegenwartssprache
Refine
Year of publication
Document Type
- Conference Proceeding (36) (remove)
Has Fulltext
- yes (36)
Is part of the Bibliography
- no (36) (remove)
Keywords
- Deutsch (19)
- Gesprochene Sprache (9)
- Korpus <Linguistik> (9)
- Englisch (5)
- Annotation (3)
- Verb (3)
- Artikulatorische Phonetik (2)
- Automatische Sprachanalyse (2)
- Automatische Spracherkennung (2)
- Fachsprache (2)
Publicationstate
- Veröffentlichungsversion (10)
- Postprint (6)
- Zweitveröffentlichung (3)
Reviewstate
- (Verlags)-Lektorat (6)
- Peer-Review (5)
- Review-Status-unbekannt (4)
Publisher
- European Language Resources Association (ELRA) (2)
- Evangelische Akademie Loccum (2)
- ISCA (2)
- Niemeyer (2)
- Aisthesis (1)
- Arbeitskreis Deutsch als Fremdsprache beim DAAD (AKDaF) (1)
- Association of Internet Researchers (1)
- BKA (1)
- Berkeley Linguistics Society (1)
- Department of Phonetics, Trier University (1)
Der Kurzbeitrag berichtet über ein Projekt ”Hypertextualisierung auf textgrammatischer Grundlage“ (HyTex), in dem erforscht wird, wie sich linear organisierte Dokumente mit semiautomatischen Methoden auf der Grundlage von textgrammatischem Markup und der linguistisch motivierten Modellierung terminologischen Wissens in delinearisierte Hyperdokumente überführen lassen. Ziel ist es, eine Sammlung von Fachtexten so in einen Hypertext zu überführen, dass terminologiebedingte Verständnisschwierigkeiten beim Lesen durch entsprechende Linkangebote aufgelöst werden, so dass die Fachtexte auch von Semi-Experten der Domäne selektiv gelesen werden können. Der Schwerpunkt des Beitrags liegt auf der Modellierung terminologischen Wissens mit XML Topic Maps und dessen Stellenwert für die automatische Erzeugung von Hyperlinks.
This paper explores on the basis of empirical research, how patterns of interaction and argumentation in political discourse on Twitter evolve as translocal communities in the creative shape of “joint digital storytelling”. Joint storytelling embraces coordinated activities by multiple actors focusing on a shared topic. By adding personal information and evaluation, participants construct an open narrative format, which can be inviting and inspiring for others, who then join in with their own narratives. This model will be exemplified by analyzing a large amount of tweets (107,000) collected during a political conflict between proponents and adversaries of a local traffic project in Germany. Analysis is based on (1) the textual level, (2) the operative level (hashtags, @- and RT-Symbol, hyperlinks etc.) and (3) the visual level of storytelling (embedded photos, videos). Results show a new way of creating translocal online communities and political deliberation.
In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in listeners' preferences. In perception experiments using German speech synthesis, two standard duration models (Klatt rules and CART) were tested. The input to these models consisted of symbolic strings which were either derived from a database or a text-to-speech system. Results of the perception experiments show that different duration models can only be distinguished when the symbolic string is appropriate. Considering the relative importance of the symbolic representation, "post-lexical" segmental rules were investigated with the outcome that listeners differ in their preferences regarding the degree of segmental reduction. As a conclusion, before fine-tuning the duration prediction, it is important to calculate an appropriate phonological symbolic representation in order to improve timing in synthetic speech.
In this study we investigate the intonational characteristics of the four utterance types statement, wh-question, yes/no-question and declarative question. Readings of two German scripted dialogues were examined to ascertain characteristic features of the F0 contour for each utterance type. Final boundary tone, nuclear pitch accent, F0 offset, F0 onset, F0 range, and the slopes of a topline and a bottomline were determined for each utterance and compared for the four utterance types. Results show that for an average speaker, the final boundary tone, the F0 range, and the slope of the topline can be used to distinguish between the four utterance types. However, speakers may deviate from this pattern and exploit other intonational means to distinguish certain utterance types or choose not to mark a syntactic difference at all.
The naturalness of synthetic speech depends strongly on the prediction of appropriate prosody. For the present study the original annotation of the German speech database “Kiel Corpus of Read Speech” was extended automatically with syntactic features, word frequency, and syllable boundaries. Several classification and regression trees for predicting symbolic prosody features, postlexical phonological processes, duration, and F0 were trained on this database. The perceptual evaluation showed that the overall perceptual quality of the German text-to-speech system MARY can be significantly improved by training all models that contribute to prosody prediction on the same database. Furthermore, it showed that the error introduced by symbolic prosody prediction perceptually equals the error produced by a direct method that does not exploit any symbolic prosody features.
We present the annotation of information structure in the MULI project. To learn more about the information structuring means in prosody, syntax and discourse, theory- independent features were defined for each level. We describe the features and illustrate them on an example sentence. To investigate the interplay of features, the representation has to allow for inspecting all three layers at the same time. This is realised by a stand-off XML mark-up with the word as the basic unit. The theory-neutral XML stand-off annotation allows integrating this resource with other linguistic resources such as the Tiger Treebank for German or the Penn treebank for English.
The goal of the MULI (MUltiLingual Information structure) project is to empirically analyse information structure in German and English newspaper texts. In contrast to other projects in which information structure is annotated and investigated (e.g. in the Prague Dependency Treebank, which mirrors the basic information about the topic-focus articulation of the sentence), we do not annotate theory-biased categories like topic-focus or theme-rheme. Trying to be as theory-independent as possible, we annotate those features which are relevant to information structure and on the basis of which typical patterns, co-occurrences or correlations can be determined. We distinguish between three annotation levels: syntax, discourse and prosody. The data is based on the TIGER Corpus for German and the Penn Treebank for English, since the existing information on part-of-speech and syntactic structure can be re-used for our purposes. The actual annotation of an English example sequence illustrates our choice of categories on each level. Their combination offers the possibility to investigate how information structure is realised and can be interpreted.
We present an XML-based metadata standard for the documentation of speech and multimedia corpora that was developed at the Institute for German Language (IDS) in Mannheim, Germany. The IDS is one of the major institutions providing German speech and language corpora to researchers. These corpora stem from many different sources and were previously documented in a rather heterogeneous fashion using a variety of data models and formats. In order to unify the documentation for existing and future corpora, the IDS- internal Archive for Spoken German collaborated with several projects and developed a set of standardised XML metadata schemas. These XML schemas build on existing internal and external documentation schemas (such as IMDI) and take into account the workflow of speech corpus production. In order to minimise redundancy, separate schemas were designed for projects, speakers, recording sessions, and entire corpora. The resulting schemas are tested in ongoing speech and multi-media projects at the IDS and are regularly revised. They are accompanied by element definitions, guidelines, and examples. In addition, a mapping to IMDI will be provided.
The metadata management system for speech corpora “memasysco” has been developed at the Institut für Deutsche Sprache (IDS) and is applied for the first time to document the speech corpus “German Today”. memasysco is based on a data model for the documentation of speech corpora and contains two generic XML schemas that drive data capture, XML native database storage, dynamic publishing, and information retrieval. The development of memasysco’s information architecture was mainly based on the ISLE MetaData Initiative (IMDI) guidelines for publishing metadata of linguistic resources. However, since we also have to support the corpus management process in research projects at the IDS, we need a finer atomic granularity for some documentation components as well as more restrictive categories to ensure data integrity. The XML metadata of different speech corpus projects are centrally validated and natively stored in an Oracle XML database. The extension of the system to the management of annotations of audio and video signals (e.g. orthographic and phonetic transcriptions) is planned for the near future.
This paper is concerned with a novel methodology for generating phonetic questions used in tree-based state tying for speech recognition. In order to implement a speech recognition system, language-dependent knowledge which goes beyond annotated material is usually required. The approach presented here generates phonetic questions for decision trees are based on a feature table that summarizes the articulatory characteristics of each sound. On the one hand, this method allows better language-specific triphone models to be defined given only a feature-table as linguistic input. On the other hand, the feature-table approach facilitates efficient definition of triphone models for other languages since again only a feature table for this language is required. The approach is exemplified with speech recognition systems for English and Thai.
In the context of the HyTex project, our goal is to convert a corpus into a hypertext, basing conversion strategies on annotations which explicitly mark up the text-grammatical structures and relations between text segments. Domain-specific knowledge is represented in the form of a knowledge net, using topic maps. We use XML as an interchange format. In this paper, we focus on a declarative rule language designed to express conversion strategies in terms of text-grammatical structures and hypertext results. The strategies can be formulated in a concise formal syntax which is independend of the markup, and which can be transformed automatically into executable program code.
This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph.
The current paper presents a corpus containing 35 dialogues of spontaneously spoken southern German, including half an hour of articulography for 13 of the speakers. Speakers were seated in separate recording chambers, mimicking a telephone call, and recorded on individual audio channels. The corpus provides manually corrected word boundaries and automatically aligned segment boundaries. Annotations are provided in the Praat format. In addition to audio recordings, speakers filled out a detailed questionnaire, assessing among others their audio-visual consumption habits.
The present study introduces articulography, the measurement of the position of tongue and lips during speech, as a promising method to the study of dialect variation. By using generalized additive modeling to analyze articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation across dozens of speakers. Our results on the basis of Dutch dialect data show clear differences between the southern and the northern dialect with respect to tongue position, with a more frontal tongue position in the dialect from Ubbergen (in the southern half of the Netherlands) than in the dialect of Ter Apel (in the northern half of the Netherlands). Thus articulography appears to be a suitable tool to investigate structural differences in pronunciation at the dialect level.
As can be shown for English data, the assimilation of the alveolar stop can result from an increased gestural overlap of the following oral closure gesture. Our experiment with German synthetic speech showed similar results. Further, it suggests that it is neccessary to complete the gestural specification of the glottal state. A voiced stop should be represented not only by an oral gesture, but by a glottal one as well.
The aim of this paper is to highlight the actual need for corpora that have been annotated based on acoustic information. The acoustic information should be coded in features or properties and is needed to inform further processing systems, i.e. to present a basis for a speech recognition system using linguistic information. Feature annotation of existing corpora in combination with segmental annotation can provide a powerful training material for speech recognition systems, but will as well challenge the further processing of features to segments and syllables. We present here the theoretical preliminaries for our multilingual feature extraction system, that we are currently working on.
Bericht über die 15. Arbeitstagung zur Gesprächsforschung vom 30. März - 1. April 2011 in Mannheim
(2011)
Instrumente für die Arbeit mit Korpora gesprochener Sprache. Text-Ton-Alignment und COSMAS II
(2000)
The research project “German Today” aims to determine the amount of regional variation in (near-) standard German spoken by young and older educated adults, and to identify and locate the regional features. To this end, an extensive corpus of read and spontaneous speech is currently being compiled. German is a so-called pluricentric language. With our corpus we aim to determine whether national or regional standards really exist. Furthermore, the linguistic variation due to different contextual styles (read vs. spontaneous speech) shall be analysed. Finally, the corpus will enable us to investigate whether linguistic change has occurred in the domain of the German standard language. The main focus of all research questions is on phonetic variation (lexical variation is only of minor interest). Read and spontaneous speech of four secondary school students (aged seventeen to twenty) and two fifty- to sixt-year-olds is recorded in 160 cities throughout the German-speaking area of Europe. All participants read a number of short texts and word lists, name pictures, translate from English, and take part in a sociobiographic interview and a map task experiment. The resulting corpus will comprise over 1000 hours of orthographically and (in part) phonetically transcribed speech.
Whether verbs have to be marked as punctual vs. durative has been a controversial issue from the very beginnings of research on aktionsarten in the last century right on up to modern theories of aspectual classes and aspect composition. Debates about the linguistic necessity of this distinction have often been accompanied by the question of what it means for a verb to be temporally punctual. In this paper I will, firstly, sketch the history of research on the punctual-durative distinction and present several linguistic arguments in its favor. Secondly, I will show how this distinction is captured in an eventstructure- based approach to lexical semantics. Thirdly, I will discuss the extent to which a precise definition of the notions used in lexical
representations helps avoid circular argumentation in lexical semantics. Finally, I will demonstrate how this can be done for the notion of ‘punctuality’ by clarifying the logical type of this predicate and relating it to central cognitive time concepts.
The "imperfective-paradox" paradox and other problems with the semantics of the progressive aspect
(2000)
This paper is about the meaning of the progressive aspect, of which it has been notoriously difficult to give a satisfying account. 1 A number of intriguing properties of its meaning were first brought out in formal semantic treatments. An event semantics approach to the progressive that integrates concepts of nonnality and perspective as well as adequate lexical representations seems to be particularly promising. In section 1 I will present several problems connected with the semantics of the progressive that are crucial for shaping its truth conditions. Several solutions to these problems that have been suggested in the literature will be discussed. 2 In section 2 I will sketch a preliminary account of the meaning of the progressive aspect. In section 2.1 the basic components that underlie the truth conditions of the progressive will be described. In section 2.2 I will present underlying lexical assumptions and the truth conditions for the progressive. Finally, in section 2.3, I will evaluate the proposal by revisiting the problems discussed.
Lexical-semantic theories often suffer from the imprecision of the concepts they employ in their representations. This leads to a considerable decrease in empirical strength by inviting circular argumentation. A demonstration of how to go about overcoming such shortcomings will be carried out, using the lexical semantic concept of "punctuality" as an example. Firstly, I will argue that the distinction between punctuality and durativity plays a crucial role for the explanation of a wide range of syntactic and semantic phenomena. Secondly, I will discuss methodological issues involved in arriving at a more precise definition of punctuality and, finally, the notion of "punctuality" will be given an interpretation on the basis of extensive consultation of research on cognitive time concepts.
This paper is about the meaning of the progressive aspect, which has been notoriously difficult to give a satisfying account of. A number of intriguing properties of its meaning were first brought out in formal semantic treatments. An event semantics approach to the progressive which integrates concepts of normality and perspective as well as adequate lexical representations seems to be particularly promising. In section 2 I will present several problems connected with the semantics of the progressive that are crucial for shaping its truth conditions. Several solutions to these problems that have been suggested in the literature will be discussed. In section 3 I will sketch a preliminary account of the meaning of the progressive aspect. In section 3.1 the basic components that underlie the truth conditions of the progressive will be described. In section 3.2 I will present underlying lexical assumptions and the truth conditions for the progressive. Finally, in section 4, I will evaluate the proposal by revisiting the problems discussed.
Instruktionsstile
(1982)