Refine
Year of publication
Document Type
- Conference Proceeding (20)
- Article (7)
- Doctoral Thesis (1)
- Other (1)
- Working Paper (1)
Has Fulltext
- yes (30)
Keywords
- Korpus <Linguistik> (10)
- Forschungsdaten (8)
- Artikulation (5)
- prosody (5)
- Metadatenmodell (4)
- syllable prominence (4)
- Metadaten (3)
- Prosodie (3)
- Sprachproduktion (3)
- acoustic correlates (3)
Publicationstate
- Veröffentlichungsversion (19)
- Ahead of Print (1)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
Publisher
- International Speech Communications Association (5)
- CLARIN (2)
- Elsevier (2)
- Linköping University Electronic Press (2)
- Sage Publications (2)
- University of Glasgow (2)
- Zenodo (2)
- City University of Hong Kong (1)
- European Language Resources Association (1)
- Institut für Phonetik und Sprachverarbeitung, Universität München (1)
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.
The present study uses electromagnetic articulography, by which the position of tongue and lips during speech is measured, for the study of dialect variation. By using generalized additive modeling to analyze the articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation of dozens of speakers. Our results show that two Dutch dialects show clear differences in their articulatory settings, with generally a more anterior tongue position in the dialect from Ubbergen in the southern half of the Netherlands than in the dialect of Ter Apel in the northern half of the Netherlands. A comparison with formant-based acoustic measurements further reveals that articulography is able to reveal interesting structural articulatory differences between dialects which are not visible when only focusing on the acoustic signal.
The present study introduces articulography, the measurement of the position of tongue and lips during speech, as a promising method to the study of dialect variation. By using generalized additive modeling to analyze articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation across dozens of speakers. Our results on the basis of Dutch dialect data show clear differences between the southern and the northern dialect with respect to tongue position, with a more frontal tongue position in the dialect from Ubbergen (in the southern half of the Netherlands) than in the dialect of Ter Apel (in the northern half of the Netherlands). Thus articulography appears to be a suitable tool to investigate structural differences in pronunciation at the dialect level.
A frequently replicated finding is that higher frequency words tend to be shorter and contain more strongly reduced vowels. However, little is known about potential differences in the articulatory gestures for high vs. low frequency words. The present study made use of electromagnetic articulography to investigate the production of two German vowels, [i] and [a], embedded in high and low frequency words. We found that word frequency differently affected the production of [i] and [a] at the temporal as well as the gestural level. Higher frequency of use predicted greater acoustic durations for long vowels; reduced durations for short vowels; articulatory trajectories with greater tongue height for [i] and more pronounced downward articulatory trajectories for [a]. These results show that the phonological contrast between short and long vowels is learned better with experience, and challenge both the Smooth Signal Redundancy Hypothesis and current theories of German phonology.
Repeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability. To address this, we investigate the extent to which repetition and predictability influence the articulation of the frequent German word “sie” [zi] (they). We find that articulatory variability is proportional to speaking rate and the duration of [zi], and that overall variability decreases as [zi] is repeated during the experiment. Lower variability is also observed as the conditional probability of [zi] increases, and the greatest reduction in variability occurs during the execution of the vocalic target of [i]. These results indicate that practice can produce observable differences in the articulation of even the most common gestures used in speech.
Repeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability. To address this, we investigate the extent to which repetition and predictability influence the articulation of the frequent German word “sie” [zi] (they). We find that articulatory variability is proportional to speaking rate and the duration of [zi], and that overall variability decreases as [zi] is repeated during the experiment. Lower variability is also observed as the conditional probability of [zi] increases, and the greatest reduction in variability occurs during the execution of the vocalic target of [i]. These results indicate that practice can produce observable differences in the articulation of even the most common gestures used in speech.
This report presents a corpus of articulations recorded with Schlieren photography, a recording technique to visualize aeroflow dynamics for two purposes. First, as a means to investigate aerodynamic processes during speech production without any obstruction of the lips and the nose. Second, to provide material for lecturers of phonetics to illustrates these aerodynamic processes. Speech production was recorded with 10 kHz frame rate for statistical video analyses. Downsampled videos (500 Hz) were uplodad to a youtube channel for illustrative purposes. Preliminary analyses demonstrate potential in applying Schlieren photography in research.
The relation between speed and curvature provides a characterization of the spatio-temporal orchestration of kinematic movements. For hand movements, this relation has been reported to follow a power law with exponent -1/3. The same power law has been claimed to govern articulatory movements. We studied the functional form of speed as predicted by curvature using electromagnetic articulography, focusing on three sensors: the tongue tip, the tongue body, and the lower lip. Of specific interest to us was the question of whether the speed-curvature relation is modified by articulatory practice, gauged with words’ frequencies of occurrence. Although analyses imposing linearity a priori indeed supported a power law, relaxation of this linearity assumption revealed that the effect of curvature on speed levels off substantially for lower values of curvature. A modification of the power law is proposed that takes this curvature into account. Furthermore, controlling statistically for number of phones and word duration, we observed that the speed-curvature function was further modulated by an interaction of lexical frequency by curvature, such that for increasing frequency, speed decreased slightly for low curvatures while it increased slightly for high curvatures. The modulation of the balance between speed and curvature by lexical frequency provides further evidence that the skill of articulation improves with practice on a word-to-word basis, and challenges theories of speech production.
The perception of syllable prominence depends to a limited extent on the acoustic properties of the speech signal in question. Psychoacoustic factors are involved as well. Thus, research often relies on two types of data: subjective prominence ratings collected in perception experiments and acoustic measures. A problem with the rating data is noise resulting from individual approaches to the rating task. This paper addresses the question of how this noise can be reduced by normalization, evaluating 12 normalization methods. In a perception experiment, prominence ratings concerning German read speech were collected. From the raw rating data 12 different ‘mirror’ data-sets were computed according to the 12 methods. Each mirror data-set was correlated with the same set of underlying acoustic data. The multiple regression setup included raw syllable duration as well as within-syllable maximum F0 and intensity. Adjusted r2-values could beraised considerably with selected methods.
The instructions under which raters quantify syllable prominence perception need to be simple in order to maintain immediate reactions. This leads to noise in the rating data that can be dealt with by normalization, e.g. setting central tendency = 0 and dispersion = 1 (as in Z-score normalization). Questions arise such as: Which parameter is adequate here to capture central tendency? Which reference distribution should the normalization be based on? In this paper 16 different normalization methods are evaluated. In a perception experiment using German read speech (prose and poetry), syllable prominence ratings were collected. From the rating data 16 complete “mirror” data-sets were computed according to the 16 methods. Each mirror data-set was correlated with the same set of measures from the underlying acoustic data, focusing on raw syllable duration which is seen as a rather straightforward acoustic aspect of syllable prominence. Correlation coefficients could be raised considerably by selected methods.