Refine
Year of publication
- 2005 (18) (remove)
Document Type
- Conference Proceeding (18) (remove)
Has Fulltext
- yes (18)
Is part of the Bibliography
- no (18)
Keywords
- Computerlinguistik (4)
- Deutsch (4)
- Auszeichnungssprache (2)
- Automatische Spracherkennung (2)
- Kontrastive Linguistik (2)
- Korpus <Linguistik> (2)
- Polnisch (2)
- Portugiesisch (2)
- Texttechnologie (2)
- XML (2)
Publicationstate
- Veröffentlichungsversion (14)
- Postprint (2)
- Zweitveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (13)
- Peer-Review (1)
- Review-Status-unbekannt (1)
Publisher
We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.
The naturalness of synthetic speech depends strongly on the prediction of appropriate prosody. For the present study the original annotation of the German speech database “Kiel Corpus of Read Speech” was extended automatically with syntactic features, word frequency, and syllable boundaries. Several classification and regression trees for predicting symbolic prosody features, postlexical phonological processes, duration, and F0 were trained on this database. The perceptual evaluation showed that the overall perceptual quality of the German text-to-speech system MARY can be significantly improved by training all models that contribute to prosody prediction on the same database. Furthermore, it showed that the error introduced by symbolic prosody prediction perceptually equals the error produced by a direct method that does not exploit any symbolic prosody features.
This paper is concerned with a novel methodology for generating phonetic questions used in tree-based state tying for speech recognition. In order to implement a speech recognition system, language-dependent knowledge which goes beyond annotated material is usually required. The approach presented here generates phonetic questions for decision trees are based on a feature table that summarizes the articulatory characteristics of each sound. On the one hand, this method allows better language-specific triphone models to be defined given only a feature-table as linguistic input. On the other hand, the feature-table approach facilitates efficient definition of triphone models for other languages since again only a feature table for this language is required. The approach is exemplified with speech recognition systems for English and Thai.
HMMs are the dominating technique used in speech recognition today since they perform well in overall phone recognition. In this paper, we show the comparison of HMM methods and machine learning techniques, such as neural networks, decision trees and ensemble classifiers with boosting and bagging in the task of articulatory-acoustic feature classification. The experimental results show that HMM methods work well for the classification of such features as vocalic. However, decision tree and bagging outperform HMMs for the fricative classification task since the data skewness is much higher than for the feature vocalic classification task. This demonstrates that HMMs do not perform as well as decision trees and bagging in highly skewed data settings.
This paper provides a lexicalist formal description of preposition-pronoun contraction (PPC) in Polish, using the theoretical framework of HPSG. Considering the behaviour of PPC with respect to the prosodic, categorial, syntactic and semantic properties, the assumption can be made that each PPC is a morphological unit with prepositional status. The crucial difference between a PPC and a typical preposition consists, besides the phonological form, in the valence properties. While a typical preposition realizes its complement externally via general constraints on phrase structure, the realization of a PPC argument is effected internally by virtue of its lexical entry. Here, we will provide the appropriate implicational lexical constraints that license both typical Ps and PPCs.
This paper provides a treatment of Polish Plural Comitative Constructions in the paradigm of HPSG in the tradition of Pollard and Sag (1994). Plural Comitative Constructions (PCCs) have previously been treated in terms of coordination, complementation and adjunction. The objective of this paper is to show that PCCs are neither instances of typical coordinate structures nor of typical complement or adjunct structures. It thus appears difficult to properly describe them by means of the standard principles of syntax and semantics. The analysis proposed in this paper accounts for the syntactic and semantic properties of PCCs in Polish by assuming an adjunction-based syntactic structure for PCCs, and by treating the indexical information provided by PCCs not as subject to any inheritance or composition, but as a result of applying a set of principles on number, gender and person resolution that also hold for ordinary coordinate structures.
Contextual lexical relations, such as sense relations, have traditionally played an essential role in disambiguating word senses in lexicography, as they offer insights into the meaning and use of a word. However, the description of paradigmatic relations in particular is often restricted to a few types such as synonymy and antonymy. The limited description of various types of relations and the method of presenting these relations in existing German dictionaries are often problematic.
Elexiko, the first German hypertext dictionary compiled exclusively on the basis of an electronic corpus, offers a new way of presenting sense relations, using a variety of approaches to extract the necessary data. In this paper, I will show how elexiko presents a differentiated system of paradigmatic relations including synonymy, various subtypes of incompatibility (such as antonymy, complementarity, converseness, reversiveness, etc.), and vertical structures (such as hyponymy and meronymy). Primary attention, however, will focus on the question of how data for a paradigmatic description is retrieved from the corpus. Whereas a corpus-driven approach is mainly used for various semantic information and a corpus-based method plays an important part in obtaining data for the grammatical description in elexiko, it will be argued that both the corpus-driven and the corpus-based approach can be complementary methods in gaining insights into sense relations. I will demonstrate which results can be obtained by each approach, and advantages and disadvantages of both procedures will be explored in more detail.
As sense relations are context-dependent, it will also be demonstrated how a sense-bound presentation can be realised in an electronic reference work including a system of cross-referencing that illustrates lexical structures and the interrelatedness of words within the lexicon. Finally, I will show how accompanying examples from the corpus and additional lexicographic information help the user to understand contextual restrictions, so that s/he is able to use dictionary information more effectively.