Refine
Year of publication
Document Type
- Conference Proceeding (15)
- Part of a Book (4)
- Article (2)
- Working Paper (2)
- Doctoral Thesis (1)
Keywords
- Deutsch (9)
- Artikulatorische Phonetik (6)
- Vokal (5)
- Phonetik (4)
- Automatische Spracherkennung (3)
- Diphthong (3)
- Englisch (3)
- Gesprochene Sprache (3)
- Korpus <Linguistik> (3)
- Lautstärke (3)
Publicationstate
- Veröffentlichungsversion (9)
- Postprint (8)
Reviewstate
Publisher
- Institut für Phonetik und Sprachliche Kommunikation, Ludwig Maximilians Universität München (3)
- Leibniz-Zentrum allgemeine Sprachwissenschaft (ZAS); Humboldt-Universität zu Berlin (2)
- Acoustical Society of America (1)
- Association pour l'Avancement des Etudes Iraniennes (1)
- Department of Linguistics, University of California (1)
- Department of Linguistics, University of Cambridge (1)
- Department of Phonetics, Trier University (1)
- European Language Resources Association (1)
- Heidelberg University Publishing (1)
- International Phonetic Association (1)
Notions such as “corpus-driven” versus “theory-driven” bring into focus the specific role of corpora in linguistic research. As for phonology with its intrinsic focus on abstract categorical representation, there is a question of how a strictly corpus-driven approach can yield insight into relevant structures. Here we argue for a more theory-driven approach to phonology based on the concept of a phonological grammar in terms of interacting constraints. Empirical validation of such grammars comes from the potential convergence of the evidence from various sources including typological data, neutralization patterns, and in particular patterns observed in the creative use of language such as acronym formation, loanword adaptation, poetry, and speech errors. Further empirical validation concerns specific predictions regarding phonetic differences among opposition members, paradigm uniformity effects, and phonetic implementation in given segmental and prosodic contexts. Corpora in the narrowest sense (i.e. “raw” data consisting of spontaneous speech produced in natural settings) are useful for testing these predictions, but even here, special purpose-built corpora are often necessary.
In diesem Beitrag werden drei quantitative Studien vorgestellt, mit deren Hilfe untersucht wird, ob neben dem robusten Längenunterschied auch Qualitätsunterschiede für die deutschen <a>-Laute vorhanden sind (z.B. <Saat> versus <satt>). Auf Basis von ausgewählten Korpora und instrumentalphonetischen Messungen kann dieser Zusammenhang bestätigt werden. Zudem zeigen sich signifikante Unterschiede in den dynamischen
Verläufen der beiden Vokale.
We present evidence for the analysis of the vowels in English <say> and <so> as biphonemic diphthongs /ɛi/ and /əu/, based on neutralization patterns, regular alternations, and foot structure. /ɛi/ and /əu/ are hence structurally on a par with the so called “true diphthongs” /ɑi/, /ɐu/, /ɔi/, but also share prosodic organization with the monophthongs /i/ and /u/. The phonological evidence is supported by dynamic measurements based on the American English TIMIT database.
Calculations of F2-slopes proved to be especially suited to distinguish the relevant groups in accordance with their phonologically motivated prosodic organizations.
This paper is concerned with a novel methodology for generating phonetic questions used in tree-based state tying for speech recognition. In order to implement a speech recognition system, language-dependent knowledge which goes beyond annotated material is usually required. The approach presented here generates phonetic questions for decision trees are based on a feature table that summarizes the articulatory characteristics of each sound. On the one hand, this method allows better language-specific triphone models to be defined given only a feature-table as linguistic input. On the other hand, the feature-table approach facilitates efficient definition of triphone models for other languages since again only a feature table for this language is required. The approach is exemplified with speech recognition systems for English and Thai.
The Partitur Format at BAS
(1997)
Most spoken language resources are produced and disseminated together with symbolic information relating to the speech signal. These are for instance orthographic transcript labeling and segmentation on the phonologic phoneti prosodic phrasal level. Most of the known formats for these symbolic data are defined in a ‘closed form’ that is not fexible enough to allow simple and platform independent processing and easy extensions.
At the Bavarian Archive for Speech Signals (BAS) a new format has been developed and used over the last few years that shows some significant advantages over other existing formats. This paper describes the basic principles behind this format discusses briefly the advantages and gives detailed definitions of the description levels used so far.
This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph.
HMMs are the dominating technique used in speech recognition today since they perform well in overall phone recognition. In this paper, we show the comparison of HMM methods and machine learning techniques, such as neural networks, decision trees and ensemble classifiers with boosting and bagging in the task of articulatory-acoustic feature classification. The experimental results show that HMM methods work well for the classification of such features as vocalic. However, decision tree and bagging outperform HMMs for the fricative classification task since the data skewness is much higher than for the feature vocalic classification task. This demonstrates that HMMs do not perform as well as decision trees and bagging in highly skewed data settings.
In Articulatory Phonology the jaw is not controlled individually but serves as an additional articulator to achieve the primary constriction. In this study the timing of jaw and tongue tip gestures for the coronal consonants /s, , t, d, n, l/ is analysed by means of EMMA. The findings suggest that the tasks of the jaw for the fricatives are to provide a second noise source and to stabilise the tongue position (more pronounced for /s/). For the voiceless stop, the speakers seem to aim at a high jaw position for producing a prominent burst. For /l/ a low jaw position is essential for avoiding lateral contact and for the apical articulation of this sound.
MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.
As can be shown for English data, the assimilation of the alveolar stop can result from an increased gestural overlap of the following oral closure gesture. Our experiment with German synthetic speech showed similar results. Further, it suggests that it is neccessary to complete the gestural specification of the glottal state. A voiced stop should be represented not only by an oral gesture, but by a glottal one as well.
Analyses of jaw movement(obtained by Electromagnetic Articulography) and acoustics show that loud speech is an intricate phenomenon. Besides involving higher intensity and subglottal pressure it affects jaw movements as well as fundamental frequency and especially first formants. It is argued that all these effects serve the purpose of enhancing perceptual salience.
This work exploited coarticulation and loud speech as natural sources of perturbation in order to determine whether articulatory covariation (motor equivalent behavior) can be observed inspeech that is not artificially perturbed. Articulatory analyses of jaw and tongue movement in the production of alveolar consonants by German speakers were performed. The sibilant /s/ shows virtually no articulatory covariation under the influence of natural perturbations, whereas other alveolar consonants show more obvious compensatory behavior. Our conclusion is that an effect of natural sources of perturbation is noticable, but sounds are affected to different degrees.
The vowel quality in some diphthongs of Swabian (an upper german dialect) was determined by measurement of first and second formant values. A minimal contrast could be shown between two different diphthong qualities […], where for Standard German only one is assumed, viz. /ai/. The two diphthong qualities differ only slightly in onset and offset vowel quality, so a better understanding of their relationship was expected from an examination of their dynamic aspects. Our preliminary results suggest that there is indeed a difference in the temporal structure of the two diphthongs.
The aim of this paper is to highlight the actual need for corpora that have been annotated based on acoustic information. The acoustic information should be coded in features or properties and is needed to inform further processing systems, i.e. to present a basis for a speech recognition system using linguistic information. Feature annotation of existing corpora in combination with segmental annotation can provide a powerful training material for speech recognition systems, but will as well challenge the further processing of features to segments and syllables. We present here the theoretical preliminaries for our multilingual feature extraction system, that we are currently working on.