430 Deutsch
Refine
Document Type
- Conference Proceeding (5) (remove)
Language
- English (5)
Has Fulltext
- yes (5)
Keywords
- German (5) (remove)
Publicationstate
Reviewstate
The naturalness of synthetic speech depends strongly on the prediction of appropriate prosody. For the present study the original annotation of the German speech database “Kiel Corpus of Read Speech” was extended automatically with syntactic features, word frequency, and syllable boundaries. Several classification and regression trees for predicting symbolic prosody features, postlexical phonological processes, duration, and F0 were trained on this database. The perceptual evaluation showed that the overall perceptual quality of the German text-to-speech system MARY can be significantly improved by training all models that contribute to prosody prediction on the same database. Furthermore, it showed that the error introduced by symbolic prosody prediction perceptually equals the error produced by a direct method that does not exploit any symbolic prosody features.
We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising from the data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar.
This paper presents C-WEP, the Collection of Writing Errors by Professionals Writers of German. It currently consists of 245 sentences with grammatical errors. All sentences are taken from published texts. All authors are professional writers with high skill levels with respect to German, the genres, and the topics. The purpose of this collection is to provide seeds for more sophisticated writing support tools as only a very small proportion of those errors can be detected by state-of-the-art checkers. C-WEP is annotated on various levels and freely available.
This contribution presents the newest version of our ’Wortverbindungsfelder’ (fields of multi-word expressions), an experimental lexicographic resource that focusses on aspects of MWEs that are rarely addressed in traditional descriptions: Contexts, patterns and interrelations. The MWE fields use data from a very large corpus of written German (over 6 billion word forms) and are created in a strictly corpus-based way. In addition to traditional lexicographic descriptions, they include quantitative corpus data which is structured in new ways in order to show the usage specifics. This way of looking at MWEs gives insight in the structure of language and is especially interesting for foreign language learners.
A trainable prosodic model called SFC (Superposition of Functional Contours), proposed by Holm and Bailly, is here confronted to German intonation. Training material is the publicly available Siemens Synthesis Corpus that provides spoken utterances for high-quality speech synthesis. We describe the labeling framework and first evaluation results that compares the original prosody of test sentences of this corpus with their prosodic rendering by the proposed model and state-of-the-art systems available on-line on the web.