400 Sprache, Linguistik
Refine
Document Type
Language
- English (6)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Keywords
- Artikulatorische Phonetik (6) (remove)
Publicationstate
- Postprint (4)
- Veröffentlichungsversion (2)
HMMs are the dominating technique used in speech recognition today since they perform well in overall phone recognition. In this paper, we show the comparison of HMM methods and machine learning techniques, such as neural networks, decision trees and ensemble classifiers with boosting and bagging in the task of articulatory-acoustic feature classification. The experimental results show that HMM methods work well for the classification of such features as vocalic. However, decision tree and bagging outperform HMMs for the fricative classification task since the data skewness is much higher than for the feature vocalic classification task. This demonstrates that HMMs do not perform as well as decision trees and bagging in highly skewed data settings.
In Articulatory Phonology the jaw is not controlled individually but serves as an additional articulator to achieve the primary constriction. In this study the timing of jaw and tongue tip gestures for the coronal consonants /s, , t, d, n, l/ is analysed by means of EMMA. The findings suggest that the tasks of the jaw for the fricatives are to provide a second noise source and to stabilise the tongue position (more pronounced for /s/). For the voiceless stop, the speakers seem to aim at a high jaw position for producing a prominent burst. For /l/ a low jaw position is essential for avoiding lateral contact and for the apical articulation of this sound.
MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.
As can be shown for English data, the assimilation of the alveolar stop can result from an increased gestural overlap of the following oral closure gesture. Our experiment with German synthetic speech showed similar results. Further, it suggests that it is neccessary to complete the gestural specification of the glottal state. A voiced stop should be represented not only by an oral gesture, but by a glottal one as well.