Phonetik / Phonologie
Refine
Year of publication
Document Type
- Conference Proceeding (51)
- Part of a Book (43)
- Article (21)
- Book (2)
- Doctoral Thesis (2)
- Review (2)
- Working Paper (2)
- Part of Periodical (1)
Keywords
- Deutsch (55)
- Phonetik (18)
- Prosodie (14)
- German (13)
- Phonologie (13)
- Korpus <Linguistik> (12)
- Kempelen, Wolfgang von (11)
- automatische Sprachproduktion (11)
- Englisch (10)
- Gesprochene Sprache (10)
Publicationstate
- Veröffentlichungsversion (55)
- Zweitveröffentlichung (17)
- Postprint (13)
- Preprint (1)
Reviewstate
Publisher
- de Gruyter (13)
- International Speech Communication Association (8)
- Schwann (8)
- TUDpress (8)
- International Speech Communications Association (5)
- Akademie-Verlag (3)
- European Language Resources Association (3)
- Institut für Deutsche Sprache (3)
- Institut für Phonetik und Sprachliche Kommunikation, Ludwig Maximilians Universität München (3)
- Leibniz-Zentrum allgemeine Sprachwissenschaft (ZAS); Humboldt-Universität zu Berlin (3)
This conversation analytic study compares the use of negation particles in spoken German and Persian, namely nein/nee and na. While these particles have a range of functions in both languages (Ghaderi 2022; Imo 2017), their use in response to news remains understudied. We focus on nein/nee and na in two sequential contexts: (i) after prior disconfirmations (Extract (a)) and (ii) in response to either solicited or unsolicited informings (see Extracts (b) and (c), respectively). In both contexts, nein/nee and na mark unexpectedness and open up an opportunity space for more, but they do so in different ways and with different outcomes. Nein/nee- and na-turns after disconfirming, often minimal responses to first-position confirmable turns mark the prior as unexpected (or even contrasting with the nein/nee/na-speaker’s expectations) and thus as expandable/accountable (cf. Ford 2001; Gubina/Betz 2021). Nein/nee/na-turns after informings (e.g., announcements that display a story teller’s negative emotional stance) differ not only in sequential position but also in prosodic realization. They can be either falling or rising, but all are characterized by marked prosody, i.e., lengthening, very low onset, smiling or breathy voice, or high overall pitch. Through position and turn design features, such nein/nee- and na-turns not only mark a prior turn as counter to (normative) expectations, but may also display the speaker’s affective stance and affiliate with the affective stance of the prior interactant. By comparing the use of nein/nee and na in German and Persian in the two functions illustrated in Extracts (a) and (b/c), we will show (i) how nein/nee- and na-turns shape interactional trajectories after responsive actions and (ii) what role the particles play in managing news and stance-taking as well as epistemic and affective positioning. Apart from revealing similarities in the use of German and Persian negation particles, the results of our crosslinguistic comparison will demonstrate that even if different languages have similar practices for specific actions, the use of these practices is language- and culture-specific. This means that even similar practices in different languages have their own “collateral effects” (Sidnell/Enfield 2012), linguistic and prosodic characteristic features, and, at least sometimes, consequences for social actions accomplished in the specific language (e.g., Dingemanse/Blythe/Dirksmeyer 2014; Evans/Levinson 2009; Floyd/Rossi/Enfield (eds.) 2020; Fox et al. 2009). Our study uses the method of Conversation Analysis (Sidnell/Stivers (eds.) 2013) and draws on more than 80 hours of audio and video recordings of spontaneous interactions (co-present, via video link, and on the telephone) in everyday and institutional contexts.
Morphophonological asymmetries in affixation concern systematic correlations between morphological properties of affixes (e.g. combination with bound versus free stems, position relative to stem (suffixes versus prefixes)) and their phonological properties (e.g. stress behaviour). The arguably most insightful approach to capturing relevant asymmetries invokes a notion of affix coherence, first introduced by Dixon in connection with his work on Yidiɲ, a nearly extinct language spoken in Northern Australia. This notion is based on a categorical division of affixes into ones that integrate into the phonological word of the stem and ones that do not. The integration of affixes is envisioned as being fully determined by phonological and morphological structure in a given language and verifiable by diagnostics relevant to phonological word domains (primarily the syllable and the foot structure). The assumption of two types of prosodic domains characterized by integrated versus non-integrated affixes is manifest in consistent asymmetries that pertain to morphophonological, phonological, and phonetic rules. This consistency constitutes compelling evidence for the structure-based analysis of the impact of various affixes on derived words, as opposed to alternative approaches to capturing these effects by associating affixes with diacritics (morpheme versus word boundary, class 1 versus class 2, stratum 1 versus stratum 2). The present entry aims to demonstrate, mostly on the basis of data from Germanic languages, the breadth of the empirical evidence in support of a fundamental role of affix coherence. Moreover, it aims to draw attention to the various implications of affix coherence for modeling relevant generalizations, in particular the necessary reference to a level of phonological representation characterized by a specific degree of abstractness (‘phonemic’).
Die erfolgreiche Wiederverwendung gesprochener Korpora muss fachspezifischen Evaluationskritierien genügen und erfordert daher eine flexible Korpusarchitektur, die durch multirepräsentationale (Verfügbarkeit eines akustischen Signals und einer Transliteration) und multisituationale Daten (Variabilität von Situationen bzw. Aufgaben) gekennzeichnet ist. Diese Kriterien werden in einer Fallstudie zur /eː/-Diphthongisierung polnischer Deutschlerner/-innen angewendet und diskutiert. Die Fallstudie repliziert die Ergebnisse der /eː/-Diphthongisierung bei Bildbenennungen von Nimz (2016). Vor der Wiederverwendung werden weitere fachspezifische Evaluationskriterien überprüft, wie Multisituationalität, Aufnahmequalitäten, Erweiterbarkeit, vorhandene Metadaten und vorhandene Dokumentation. Nach der Replikationsstudie werden die Herausforderungen für eine Umsetzung der Wiederverwendung bezüglich Datenmanagement, Workflows und Data Literacy in Forschungs- und Lehrkontexten diskutiert.
The shortening of linguistic expressions naturally involves some sort of correspondence between short forms and (some portion of) the respective full forms. Based mostly on data from English and Hebrew this article explores the hypothesis that such correspondence concerns necessary sameness of symbolic form, referring either to graphemic or to a specific level of phonological representation. That level indicates a degree of abstractness defined by language-specific contrastiveness (i.e. “phonemic”). Reference to written form can be shown to be highly systematic in certain contexts, including cases where full forms consist of multiple stems. Specific asymmetries pertaining to the targeting of material by correspondence (e.g. initial vs. non-initial position) appear to be alike for both types of representation, a claim supported by a study based on a nomenclature strictly confined to writing (chemical element symbols).
Identity effects in phonology are deviations from regular phonological form (i.e. canonical patterns) which are due to the relatedness between words. More specifically, identity effects are those deviations which have the function to enhance similarity in the surface phonological form of morphologically related words. In rule-based generative phonology the effects in question are described by means of the cycle. For example, the stress on the second syllable in cond[ɛ]nsation as opposed to the stresslessness of the second syllable in comp[ǝ]nsation is described by applying the stress rules initially to the sterns thereby yielding condénse and cómpensàte. Subsequently the stress rules are reapplied to the affixed words with the initial stress assignment (i.e. stress on the second syllable in condense, but not in compensate) leaving its mark in the output form (cf. Chomsky and Halle 1968). A second example are words like lie[p]los 'unloving' in German, which shows the effects of neutralization in coda position (i.e. only voiceless obstruents may occur in coda position) even though the obstruent should 'regularly' be syllabified in head position (i.e. bl is a wellformed syllable head in German). Here the stern is syllabified on an initial cycle, obstruent devoicing applies (i.e. lie[p]) and this structure is left intact when affixation applies (i.e. lie[p ]Ios ) (cf. Hall 1992). As a result the stern of lie[p]los is identical to the base lie[p].
This paper presents observations on the phonetic realisations of the German particles ja – ‘yes’ and naja – approximately ‘well’. As part of a large-scale study on the particle ja, we identified numerous instances in the dataset that had been orthographically transcribed as ja, but were phonetically realised as [nja]. Using phonetic and functional parameters, we explore the question whether these instances can be attributed to either the lexeme ja or naja. While phonetic measurements yield ambivalent results, analyses of pragmatic parameters such as function and turn position seem to indicate that [nja] was predominantly intended to be ja, although some functional differences between ja and [nja] could also be identified.
Sogenannte „Pragmatikalisierte Mehrworteinheiten“ sind im Deutschen hochfrequent und unterliegen bisweilen tiefgreifenden phonetischen Reduktionsprozessen. Diese können Realisierungsvarianten hervorbringen, die in der Rückschau auf mehr als eine lexematische Ursprungsform zurückführbar sind. Die vorliegende Studie untersucht mit [ˈzɐmɐ] einen besonders prägnanten Fall dieser Art anhand eines Perzeptionsexperimentes.
This report presents a corpus of articulations recorded with Schlieren photography, a recording technique to visualize aeroflow dynamics for two purposes. First, as a means to investigate aerodynamic processes during speech production without any obstruction of the lips and the nose. Second, to provide material for lecturers of phonetics to illustrates these aerodynamic processes. Speech production was recorded with 10 kHz frame rate for statistical video analyses. Downsampled videos (500 Hz) were uplodad to a youtube channel for illustrative purposes. Preliminary analyses demonstrate potential in applying Schlieren photography in research.
Anhand der geografischen Distribution des hohen vorderen gerundeten Vokalphonems /y/ in Europa wird das Projekt des Phonologischen Atlas Europas (Phon@Europe) vorgestellt. Der Schwerpunkt der Diskussion liegt auf Fällen der möglichen bzw. strittigen Diffusion von /y/ durch Sprachkontakt. Dabei gilt die Aufmerksamkeit auch der Rolle, die das Deutsche bei der Verbreitung von /y/ in Europa gespielt haben könnte. Es werden Vergleiche zu ähnlich gelagerten Fällen in anderen Teilen des Kontinents gezogen. Die Möglichkeit der kontaktunabhängigen Entstehung von /y/ wird ebenfalls in Betracht gezogen. Abschließend werden die Befunde kontaktlinguistisch und areallinguistisch ausgewertet und das Deutsche in der phonologischen Landschaft Europas situiert.
Smooth turn-taking in conversation depends in part on speakers being able to communicate their intention to hold or cede the floor. Both prosodic and gestural cues have been shown to be used in this context. We investigate the interplay of pitch movements and hand gestures at locations at which speaker change becomes relevant, comparing their use in German and Swedish. We find that there are some shared functions of prosody and gesture with regard to turn-taking in the two languages, but that these shared functions appear to be mediated by the different phonological demands on pitch in the two languages.
A "polyglottal" speech synthesis - modifications for a replica of Kempelen's speaking machine
(2019)
Zum Graphembegriff
(1980)
In diesem Aufsätz wird anhand von überwiegend deutschen Beispielen gezeigt, daß sich phonotaktische Beschränkungen sowohl auf die Silbe als auch auf das Morphem beziehen können. Es wird die Hypothese aufgestellt, daß nur die Beschränkungen, die das Morphem als Domäne haben, Ausnahmen zulassen können.
Im Jahr 2015 ist die 7. Auflage des Duden-Aussprachewörterbuchs erschienen, für deren Bearbeitung erstmals die MitarbeiterInnen des IDS Projekts „Gesprochenes Deutsch“ verantwortlich zeichneten. Im vorliegenden Beitrag werden die konzeptionellen und inhaltlichen Veränderungen beschrieben, die in der Neuauflage umgesetzt wurden. Sie lassen sich im Wesentlichen unter dem Motto „Hinwendung zur Deskriptivität“ zusammenfassen. Neben den üblichen lexikografischen Prozeduren wie der Streichung veralteter Lemmata und der Erweiterung des Lemmabestands um bisher nicht dokumentierte Wörter sind zunächst im Einleitungsteil Kapitel ergänzt, vollständig überarbeitet oder völlig neu erstellt worden. Systematische Veränderungen wurden bei verschiedenen Transkriptionskonventionen vorgenommen (z.B. bei der Notation der Diphthonge). Die wesentlichste Neuerung ist jedoch die Einbeziehung von empirischen Daten zum deutschen Gebrauchsstandard vor allem aus dem Projektkorpus „Deutsch heute“, die es ermöglicht haben, fundierte Angaben zur regionalen Verbreitung von Aussprachevarianten zu machen.
Der Beitrag untersucht das Zusammenspiel von funktionaler Spezialisierung und phonetischer Reduktion bei pragmatischen Markern aus komplexen Syntagmen. Im Fokus steht die Reduktionsform [ˈzɐmɐ], die potenziell auf die Marker <ich sag mal> oder <sagen wir (mal)> zurückgeführt werden konnte. Anhand einer Analyse ihrer phonetischen Reduktionsformen und Interaktionsfunktionen wird gezeigt, dass eine Rückführung auf <sagen wir (mal)> plausibler ist. Im Anschluss werden Realisierungen der Wortverbindung ‚sagen wir‘ als kompositioneller Matrixsatz mit Verwendungen als pragmatischer Marker verglichen. Die Befunde deuten auf einen Einfluss der Funktion der Zielstruktur auf ihre lautliche Realisierung hin, was sich als Indiz für einen unabhängigen Zeichenstatus der reanalysierten Markerverwendung interpretieren lasst.
Der vorliegende Beitrag thematisiert zwei unterschiedliche Forschungsergebnisse aus der Auswertung des Korpus »Deutsch heute«. Im ersten Teil wird in einem lautsystematischen Aufriss die phonetische Variation, wie sie sich in der Vorleseaussprache der österreichischen Schülerinnen in den Korpusdaten manifestiert, dargestellt. Ein zweiter Teil des Beitrags präsentiert metasprachliche Äußerungen aus sprachbiographischen Interviews, die Einblicke in sprachbezogene Kategorien und Konzepte der jungen Österreicherinnen geben und Rückschlüsse auf Spracheinstellungen zulassen. Die Schülerinnen bestätigen nicht nur verschiedene Facetten des für Österreich anzunehmenden diaglossischen Verhältnisses der Varietäten durch ihren Formengebrauch, sondern auch in metasprachlichen Aussagen, die einen hohen Grad der Bewusstheit des eigenen Sprachgebrauchs sowie der formalen wie auch soziosymbolischen Unterschiede der Varietäten erkennen lassen.
To date, little is known about prosodic accommodation and its conversational functions in instances of overlapping talk in conversation. A major conversational action that happens in overlap is turn competition. It is not known whether participants accommodate prosodic parameters locally in the overlapped turn (initialisation) or access a repertoire of prosodic patterns that refer to general prosodic parameter norms (normalisation) when competing for the turn in overlap. This paper investigates the initialisation and normalisation of fundamental frequency (f0) and assesses its role as a resource for turn competition in overlap. We drew instances of overlapping talk from a corpus of conversational multi-party interactions in British English. We annotated the overlaps on a competitiveness scale and categorised them by overlap onset position and conversational function. We automatically extracted f0 parameters from the speech signal and processed them into f0 accommodation features that represent the normalising or the initialising use of f0. Using decision tree classification we found that f0 accommodation is only relevant as a turn competitive resource in overlaps that start clearly before a speaker transition. In this turn context, we found that normalising and initialising f0 features can both be relevant turn competitive resources. Their deployment depends on the conversational function of overlap.
Das Motto der diesjährigen Jahrestagung lautet „Standardvariation - Wie viel Variation verträgt die deutsche Standardsprache?“ Gerade Entlehnungen aus anderen Sprachen werfen in diesem Zusammenhang zum Beispiel bezüglich ihrer Aussprache das Problem auf, welche Merkmale zugrunde gelegt werden sollen, die der abgebenden oder die der aufnehmenden Sprache, und wie der tatsächlich im täglichen Sprachgebrauch vorherrschenden und zum Teil erheblichen Variationsbreite Rechnung getragen werden kann bzw. soll. Anhand der in den letzten Jahrzehnten vermehrt im Deutschen verwendeten Anglizismen, also Entlehnungen aus dem angloamerikanischen Sprachraum. möchte ich im Folgenden einige Aspekte aufzeigen, die mit der lautlichen Integration von Anglizismen im Deutschen einhergehen. Zunächst wird die einschlägige Forschungsliteratur zum Thema kurz referiert, um dann die wichtigsten phonetischen und phonologischen Unterschiede zwischen dem Englischen und dem Deutschen zu beleuchten. Vor diesem Hintergrund soll dann der Frage nachgegangen werden, welche Rolle eine akzeptable oder „normgerechte“ Aussprache von Anglizismen im öffentlichen Sprachgebrauch spielt. Da Wörterbücher auch hier einen nicht unerheblichen normierenden Einfluss ausüben, soll abschließend die Frage beantwortet werden, ob es für die Ausspracheangaben von Anglizismen in deutschen Wörterbüchern einen Standard oder eher eine Variationsbreite zu dokumentieren gilt.
Seit einigen Jahren befassen sich zahlreiche geisteswissenschaftliche Arbeiten verstärkt mit der Stimme in ihrer Bedeutung für die menschliche Kommunikation. Aufgrund der Vielschichtigkeit und Ambivalenz des Phänomens wird oft von einem sehr weiten, eher metaphorischen Begriff von Stimme ausgegangen. In der Sprechwissenschaft, die traditionell einen vor allem empirischen und didaktischen Zugriff auf die Sprechstimme hat, wird dagegen mit einem vergleichsweise engen, physiologischen Begriff von Stimme operiert, im Sinne einer Körperfunktion, als Muskelaktivitätsmuster unterschiedlicher Ausprägung, Gestalt und Funktion. In engem Bezug zur klinischen Sprechstimmdiagnostik und Phoniatrie wird Stimme betrachtet als Organ, dessen Anatomie und Physiologie zu beschreiben ist. In engem Bezug zur Phonetik werden Stimmgebung und -wirkung, stimmlich-artikulatorische Ausdrucksformen merkmalsanalytisch auditiv und akustisch beschrieben. In engem Bezug zur Linguistik, Rhetorik und Soziophonetik wird Stimme betrachtet als Resultat der Stimmgebung; Gegenstand sind Sprechwirkung und interaktive Ausarbeitung der Stimme in ihrer Verwobenheit mit sprachlichen und körperlichen Ausdrucksformen. Hierbei wird die Stimme als Bestandteil des stimmlich-artikulatorischen Ausdrucks aufgefasst, insofern als Teil von persönlicher und sozialer Identität, als Trägerin von ästhetisch-künstlerischem und emotionalem Ausdruck.
Methoden zur empirischen Beschreibung des sprechstimmlichen Ausdrucks werden exemplarisch vorgestellt, sowohl anhand von Arbeiten, in denen stimmlich-artikulatorische Merkmale beschrieben und klassifiziert werden (Stimmphysiologie-, Emotionsforschung), als auch anhand von Arbeiten, in denen der stimmlich-artikulatorische Ausdruck in seiner Wirkung auf Hörer (Sprechwirkungsforschung) und in seiner interaktiven Ausarbeitung (Gesprächsforschung) betrachtet wird. Aus den Ergebnissen der empirischen Studien wird deutlich, inwieweit stimmlich-artikulatorische Ausdrucksformen als Bedeutungsträger fungieren und zur Vereindeutigung der Verständigung beitragen können.
Zur Aussprache nicht haupttoniger Vorsilben mit <e> in Lehnwörtern im deutschen Gebrauchsstandard
(2018)
Vortoniges <e> in Lehnwörtern in offenen Silben (demonstrieren, Elefant) ist in den traditionellen deutschen Aussprachewörterbüchern durchgängig mit gespanntem/geschlossenem [e] kodifiziert. Die Auswertung von insgesamt 17 entsprechenden Belegwörtern aus dem Korpus „Deutsch heute“ zeigt für den deutschen Gebrauchsstandard jedoch eine ausgeprägte Variation zwischen den Lauttypen [e], [ɛ] und [ə], die je nach Lexem in ganz unterschiedlichen Anteilen vorkommen. Als Erklärungsansätze für das differierende Variationsverhalten lassen sich Faktoren wie Wortakzentmuster, Folgekonsonanz, Formalitätsgrad und semantisch-morphologische Durchsichtigkeit der Wortbildung anführen. Außerdem zeigt die Variation auch eine ausgeprägte diatopische Dimension: Während im Norden Deutschlands, aber auch im mittelbairisch geprägten Sprachraum und in der Ostschweiz die [e]-Aussprache dominiert, überwiegen in der südlichen Mitte und im Südwesten Deutschlands, im südbairisch geprägten Sprachraum und vor allem in der Westschweiz Belege mit [ɛ]-Aussprache. Die Ergebnisse von „Deutsch heute“ zeigen sich in ähnlicher Weise auch in zusätzlich ausgewerteten Sprachdaten (Nachrichtensendungen, FOLK-Korpus).
Symbolische Repräsentation sprachlicher Lautstruktur beinhaltet die Zergliederung kontinuierlicher Rede in diskrete Einheiten, die mit einem finiten Inventar von Zeichen assoziiert werden. Die Grundidee hinter dieser Abstraktion ist, „wiederkehrendes“ Material, das trotz phonetischer Unterschiede als gleich aufgefasst wird, mit jeweils gleichen Zeichen zu assoziieren. Die Entwicklung geeigneter Verfahren zur Ermittlung einheitlicher und empirisch adäquater Abstraktionsgrade wurde in strukturalistischen Arbeiten vehement diskutiert, scheint aber allgemein seltsam vernachlässigt. In vorliegendem Beitrag wird ein solches im Rahmen der Optimalitätstheorie entwickeltes Verfahren anhand der sogenannten Vokalopposition im Deutschen vorgestellt. Verschiedene Typen konvergierender empirischer Evidenz untermauern die Annahme einer einzigen phonologisch relevanten Abstraktionsebene mit fünfzehn qualitativ unterschiedlichen Vollvokalen.
Notions such as “corpus-driven” versus “theory-driven” bring into focus the specific role of corpora in linguistic research. As for phonology with its intrinsic focus on abstract categorical representation, there is a question of how a strictly corpus-driven approach can yield insight into relevant structures. Here we argue for a more theory-driven approach to phonology based on the concept of a phonological grammar in terms of interacting constraints. Empirical validation of such grammars comes from the potential convergence of the evidence from various sources including typological data, neutralization patterns, and in particular patterns observed in the creative use of language such as acronym formation, loanword adaptation, poetry, and speech errors. Further empirical validation concerns specific predictions regarding phonetic differences among opposition members, paradigm uniformity effects, and phonetic implementation in given segmental and prosodic contexts. Corpora in the narrowest sense (i.e. “raw” data consisting of spontaneous speech produced in natural settings) are useful for testing these predictions, but even here, special purpose-built corpora are often necessary.
The relation between speed and curvature provides a characterization of the spatio-temporal orchestration of kinematic movements. For hand movements, this relation has been reported to follow a power law with exponent -1/3. The same power law has been claimed to govern articulatory movements. We studied the functional form of speed as predicted by curvature using electromagnetic articulography, focusing on three sensors: the tongue tip, the tongue body, and the lower lip. Of specific interest to us was the question of whether the speed-curvature relation is modified by articulatory practice, gauged with words’ frequencies of occurrence. Although analyses imposing linearity a priori indeed supported a power law, relaxation of this linearity assumption revealed that the effect of curvature on speed levels off substantially for lower values of curvature. A modification of the power law is proposed that takes this curvature into account. Furthermore, controlling statistically for number of phones and word duration, we observed that the speed-curvature function was further modulated by an interaction of lexical frequency by curvature, such that for increasing frequency, speed decreased slightly for low curvatures while it increased slightly for high curvatures. The modulation of the balance between speed and curvature by lexical frequency provides further evidence that the skill of articulation improves with practice on a word-to-word basis, and challenges theories of speech production.
Am Beispiel der polyfunktionalen Mehrworteinheit <was weiß ich> wird das Zusammenspiel von pragmatischer und phonetischer Ausdifferenzierung in Pragmatikalisierungsprozessen untersucht. Hierzu werden spontan-sprachliche Belege aus dem Korpus „Deutsch heute“ analysiert. Die beobachtete phonetische Variationsbreite deutet auf eine komplexe Beziehung zu den jeweiligen pragmatischen Funktionen hin.
In diesem Beitrag werden drei quantitative Studien vorgestellt, mit deren Hilfe untersucht wird, ob neben dem robusten Längenunterschied auch Qualitätsunterschiede für die deutschen <a>-Laute vorhanden sind (z.B. <Saat> versus <satt>). Auf Basis von ausgewählten Korpora und instrumentalphonetischen Messungen kann dieser Zusammenhang bestätigt werden. Zudem zeigen sich signifikante Unterschiede in den dynamischen
Verläufen der beiden Vokale.
We present evidence for the analysis of the vowels in English <say> and <so> as biphonemic diphthongs /ɛi/ and /əu/, based on neutralization patterns, regular alternations, and foot structure. /ɛi/ and /əu/ are hence structurally on a par with the so called “true diphthongs” /ɑi/, /ɐu/, /ɔi/, but also share prosodic organization with the monophthongs /i/ and /u/. The phonological evidence is supported by dynamic measurements based on the American English TIMIT database.
Calculations of F2-slopes proved to be especially suited to distinguish the relevant groups in accordance with their phonologically motivated prosodic organizations.
In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in listeners' preferences. In perception experiments using German speech synthesis, two standard duration models (Klatt rules and CART) were tested. The input to these models consisted of symbolic strings which were either derived from a database or a text-to-speech system. Results of the perception experiments show that different duration models can only be distinguished when the symbolic string is appropriate. Considering the relative importance of the symbolic representation, "post-lexical" segmental rules were investigated with the outcome that listeners differ in their preferences regarding the degree of segmental reduction. As a conclusion, before fine-tuning the duration prediction, it is important to calculate an appropriate phonological symbolic representation in order to improve timing in synthetic speech.
In this study we investigate the intonational characteristics of the four utterance types statement, wh-question, yes/no-question and declarative question. Readings of two German scripted dialogues were examined to ascertain characteristic features of the F0 contour for each utterance type. Final boundary tone, nuclear pitch accent, F0 offset, F0 onset, F0 range, and the slopes of a topline and a bottomline were determined for each utterance and compared for the four utterance types. Results show that for an average speaker, the final boundary tone, the F0 range, and the slope of the topline can be used to distinguish between the four utterance types. However, speakers may deviate from this pattern and exploit other intonational means to distinguish certain utterance types or choose not to mark a syntactic difference at all.
Wolfgang von Kempelen's book "The Mechanism of Human Speech" from 1791 is a famous milestone in the history of speech communication research. It has an enormous relevance for the phonetic sciences and it marks an important turning point for the development of the (mechanical) speech synthesis. So far no English version of this work was available, which excludes many interested researchers. Access to the original versions in German and French is restricted for various reasons. For example the blackletter script of the German version is troublesome for most of today's readers. We report here on a new edition of Kempelen's book which unites a better readable German version and its English translation. It will now also be in a searchable electronic format and has been enriched with many commentaries, which aid in the understanding of details of the late 18th century that are little known or unknown to many researchers today.
There are a number of recent replicas of Wolfgang von Kempelen's speaking machine. Although all of them are explicitly based on Kempelen's own description nearly none of them are identical in construction and sound. In this paper we want to illustrate some of these differences and their reasons for five replicas built by ourselves.
Das 18. Jahrhundert war wissenschaftlich von großen Umbrüchen geprägt, auch im Bereich der Anatomie und Physiologie des Menschen. Die hierauserwachsende lebhafte Diskussion erstreckte sich auch auf das noch sehr junge Gebiet der (mechanischen) Sprachsynthese und ihrer Grundlagen. Das Sprachsynthesekonzept Wolfgang von Kempelens (1734–1804) ist hierbei ein besonders eindrückliches Beispiel dafür, dass eine grundlegende wissenschaftliche Erkenntnis womöglich durch technologische Limitationen nicht notwendigerweise auch praktisch umgesetzt werden kann. Grundsätzlich waren Kempelens Erkenntnisse zur Anatomie und Physiologie des Menschen und damit auch zur Spracherzeugung weitestgehend zutreffend. Die praktische Umsetzung hingegen wirkt aus heutiger Sicht recht kurios. Kempelens Vokaltrakt-Konzept soll exemplarisch dem nur wenig früher entstandenen Prototypen zur Sprachsynthese Christian Gottlieb Kratzensteins (1723–1795) gegenübergestellt werden. Dessen „Erkenntnisse“ müssen heute vielfach als falsch bezeichnet werden; sein Modell zur Vokalsynthese weist einerseits auffällige Parallelen zu demjenigen KEMPELENS auf, geht hinsichtlich der Physiologie jedoch von vielfach irrigen Annahmen aus.
The Partitur Format at BAS
(1997)
Most spoken language resources are produced and disseminated together with symbolic information relating to the speech signal. These are for instance orthographic transcript labeling and segmentation on the phonologic phoneti prosodic phrasal level. Most of the known formats for these symbolic data are defined in a ‘closed form’ that is not fexible enough to allow simple and platform independent processing and easy extensions.
At the Bavarian Archive for Speech Signals (BAS) a new format has been developed and used over the last few years that shows some significant advantages over other existing formats. This paper describes the basic principles behind this format discusses briefly the advantages and gives detailed definitions of the description levels used so far.
This study investigates high vowel laxing in the Louisiana French of the Lafourche Basin. Unlike Canadian French, in which the high vowels /i, y, u/ are traditionally described as undergoing laxing (to [I, Y, U]) in word-final syllables closed by any consonant other than a voiced fricative (see Poliquin 2006), Oukada (1977) states that in the Louisiana French of Lafourche Parish, any coda consonant will trigger high vowel laxing of /i/; he excludes both /y/ and /u/ from his discussion of high vowel laxing. The current study analyzes tokens of /i, y, u/ from pre-recorded interviews with three older male speakers from Terrebonne Parish. We measured the first and second formants and duration for high vowel tokens produced in four phonetic environments, crossing syllable type (open vs. closed) by consonant type (voiced fricative vs. any consonant other than a voiced fricative). Results of the acoustic analysis show optional laxing for /i/ and /y/ and corroborate the finding that high vowels undergo laxing in word-final closed syllables, regardless of consonant type. Data for /u/ show that the results vary widely by speaker, with the dominant pattern (shown by two out of three speakers) that of lowering and backing in the vowel space of closed syllable tokens. Duration data prove inconclusive, likely due to the effects of stress. The formant data published here constitute the first acoustic description of high vowels for any variety of Louisiana French and lay the groundwork for future study on these endangered varieties.
This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph.
The perception of prosodic prominence is influenced by different sources like different acoustic cues, linguistic expectations and context. We use a generalized additive model and a random forest to model the perceived prominence on a corpus of spoken German. Both models are able to explain over 80% of the variance. While the random forests give us some insights on the relative importance of the cues, the general additive model gives us insights on the interaction between different cues to prominence.
A frequently replicated finding is that higher frequency words tend to be shorter and contain more strongly reduced vowels. However, little is known about potential differences in the articulatory gestures for high vs. low frequency words. The present study made use of electromagnetic articulography to investigate the production of two German vowels, [i] and [a], embedded in high and low frequency words. We found that word frequency differently affected the production of [i] and [a] at the temporal as well as the gestural level. Higher frequency of use predicted greater acoustic durations for long vowels; reduced durations for short vowels; articulatory trajectories with greater tongue height for [i] and more pronounced downward articulatory trajectories for [a]. These results show that the phonological contrast between short and long vowels is learned better with experience, and challenge both the Smooth Signal Redundancy Hypothesis and current theories of German phonology.
The current paper presents a corpus containing 35 dialogues of spontaneously spoken southern German, including half an hour of articulography for 13 of the speakers. Speakers were seated in separate recording chambers, mimicking a telephone call, and recorded on individual audio channels. The corpus provides manually corrected word boundaries and automatically aligned segment boundaries. Annotations are provided in the Praat format. In addition to audio recordings, speakers filled out a detailed questionnaire, assessing among others their audio-visual consumption habits.
The present study introduces articulography, the measurement of the position of tongue and lips during speech, as a promising method to the study of dialect variation. By using generalized additive modeling to analyze articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation across dozens of speakers. Our results on the basis of Dutch dialect data show clear differences between the southern and the northern dialect with respect to tongue position, with a more frontal tongue position in the dialect from Ubbergen (in the southern half of the Netherlands) than in the dialect of Ter Apel (in the northern half of the Netherlands). Thus articulography appears to be a suitable tool to investigate structural differences in pronunciation at the dialect level.
This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue). The overall high κ between 0.76 and 0.89 (depending on the speech style) shows that the DIMA conventions can be applied successfully.
The instructions under which raters quantify syllable prominence perception need to be simple in order to maintain immediate reactions. This leads to noise in the rating data that can be dealt with by normalization, e.g. setting central tendency = 0 and dispersion = 1 (as in Z-score normalization). Questions arise such as: Which parameter is adequate here to capture central tendency? Which reference distribution should the normalization be based on? In this paper 16 different normalization methods are evaluated. In a perception experiment using German read speech (prose and poetry), syllable prominence ratings were collected. From the rating data 16 complete “mirror” data-sets were computed according to the 16 methods. Each mirror data-set was correlated with the same set of measures from the underlying acoustic data, focusing on raw syllable duration which is seen as a rather straightforward acoustic aspect of syllable prominence. Correlation coefficients could be raised considerably by selected methods.
Prominence has been widely studied on the word level and the syllable level. An extensive study comparing the two approaches is missing in the literature. This study investigates how word and syllable prominence relate to each other in German. We find that perceptual ratings based on the word level are more extreme than those based on the syllable level. The correlations between word prominence and acoustic features are greater than the correlations between syllable prominence and acoustic features.
The perception of syllable prominence depends to a limited extent on the acoustic properties of the speech signal in question. Psychoacoustic factors are involved as well. Thus, research often relies on two types of data: subjective prominence ratings collected in perception experiments and acoustic measures. A problem with the rating data is noise resulting from individual approaches to the rating task. This paper addresses the question of how this noise can be reduced by normalization, evaluating 12 normalization methods. In a perception experiment, prominence ratings concerning German read speech were collected. From the raw rating data 12 different ‘mirror’ data-sets were computed according to the 12 methods. Each mirror data-set was correlated with the same set of underlying acoustic data. The multiple regression setup included raw syllable duration as well as within-syllable maximum F0 and intensity. Adjusted r2-values could beraised considerably with selected methods.
A frequently replicated finding is that higher frequency words tend to be shorter and contain more strongly reduced vowels. However, little is known about potential differences in the articulatory gestures for high vs. low frequency words. The present study made use of electromagnetic articulography to investigate the production of two German vowels, [i] and [a], embedded in high and low frequency words. We found that word frequency differently affected the production of [i] and [a] at the temporal as well as the gestural level. Higher frequency of use predicted greater acoustic durations for long vowels; reduced durations for short vowels; articulatory trajectories with greater tongue height for [i] and more pronounced downward articulatory trajectories for [a]. These results show that the phonological contrast between short and long vowels is learned better with experience, and challenge both the Smooth Signal Redundancy Hypothesis and current theories of German phonology.
In previous research we showed that the priming paradigm can be used to significantly alter the prominence ratings of subjects. In that study we only looked at the changes in the subjects’ ratings. In the present study, we analyzed the acoustic parameters of the stimuli used in the priming study and investigated the correlation between prominence ratings and acoustic parameters. The results show that priming has a significant effect on these correlations. The contribution of acoustic features on perceived prominence was found to depend on the prominence pattern. If a dominantly prominent syllable is present in a given utterance, f0 and intensity contribute most to the perceived prominence, while duration contributes most when no syllable is dominantly prominent.
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.
Streefkerk defines prominence as the perceptually outstanding parts in spoken language. An optimal rating scale for syllable prominence has not been found yet. This paper evaluates a 4-point, an 11-point, a 31-point, and a continuous scale for the rating of syllable prominence and gives support for scales using a higher number of levels. Priming effects found by Arnold, et al., could only be replicated using the 31-point scale.
Ph@ttSessionz and Deutsch heute are two large German speech databases. They were created for different purposes: Ph@ttSessionz to test Internet-based recordings and to adapt speech recognizers to the voices of adolescent speakers, Deutsch heute to document regional variation of German. The databases differ in their recording technique, the selection of recording locations and speakers, elicitation mode, and data processing.
In this paper, we outline how the recordings were performed, how the data was processed and annotated, and how the two databases were imported into a single relational database system. We present acoustical measurements on the digit items of both databases. Our results confirm that the elicitation technique affects the speech produced, that f0 is quite comparable despite different recording procedures, and that large speech technology databases with suitable metadata may well be used for the analysis of regional variation of speech.
In our study we use the experimental framework of priming to manipulate our subjects’ expectations of syllable prominence in sentences with a well-defined syntactic and phonological structure. It shows that it is possible to prime prominence patterns and that priming leads to significant differences in the judgment of syllable prominence.
Die vorliegende Dissertation beschäftigt sich mit verschieden Methoden zur Erhebung von perzeptuellen Prominenzurteilen von naiven Hörern im Deutschen. Es werden zwei Experimente vorgestellt, die sich zum einen mit der Verwendung von verschiedenen Skalen, zum anderen mit der Verwendung von unterschiedlichen Bewertungsebenen zur Beurteilung von perzeptueller Prominenz beschäftigen. Die Ergebnisse zeigen, dass Ergebnisse von Studien, welche auf unterschiedlichen Erhebungstechniken beruhen nicht ohne weiteres vergleichbar sind. Die Arbeit untersucht außerdem die Effekte einer Normalisierung der Prominenzurteile. Die Dissertation schließt mit einem Ausblick für zukünftige Studien. Hierbei werden hauptsächlich die vielfältigen Interaktionen von verschiedenen Quellen und dem Kontext bei der Beurteilung der perzeptuellen Prominenz adressiert.
The effect of manipulation of a speaker’s voice as well as exposure to a native speaker’s utterance was investigated regarding the pronunciation of stops by German learners of French. Three subject groups, a Control (CG), a Manipulation (MG), and a Native Speaker (NG) Group, were recorded on two subsequent days. The MG was presented with a manipulation of their voice on the second day and the NG listened to a native French speaker, while the CG did not receive any feedback. Results show that speakers of the MG and NG were able to extract useful information from the respective feedback and successfully adapted to it. Participants were able to reduce their voice onset time values, although speakers of the NG reduced it to a greater extent.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
This article presents preliminary results indicating that speakers have a different pitch range when they speak a foreign language compared to the pitch variation that occurs when they speak their native language. To this end, a learner corpus with French and German speakers was analyzed. Results suggest that speakers indeed produce a smaller pitch range in the respective L2. This is true for both groups of native speakers. A possible explanation for this finding is that speakers are less confident in their productions, therefore, they concentrate more on segments and words and subsequently refrain from realizing pitch range more native-like. For language teaching, the results suggest that learners should be trained extensively on the more pronounced use of pitch in the foreign language.
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
(2014)
We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of the project.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German
(2016)
Research has shown that language learners are not only challenged by segmental differences between their native language (L1) and the second language (L2). They also have problems with the correct production of suprasegmental structures, like phone/syllable duration and the realization of pitch. These difficulties often lead to a perceptible foreign accent. This study investigates the influence of prosody transplantation on foreign accent ratings. Syllable duration and pitch contour were transferred from utterances of a male and female German native speaker to utterances of ten French native speakers speaking German. Acoustic measurements show that French learners spoke with a significantly lower speaking rate. As expected, results of a perception experiment judging the accentedness of 1) German native utterances, 2) unmanipulated and 3) manipulated utterances of French learners of German suggest that the transplantation of the prosodic features syllable duration and pitch leads to a decrease in accentedness rating. These findings confirm results found in similar studies investigating prosody transplantation with different L1 and L2 and provide a beneficial technique for (computer-assisted) pronunciation training.
The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo.
The aim of this study is to select and formulate criteria for the assessment of tools and exercises that are using computer-assisted pronunciation training (CAPT). We examined ten different CAPT tools selected on the basis of an informal questionnaire among 10 colleagues working in a German-French CAPT project. Although the applied assessment must still be regarded as informal, and although the selected CAPT tools might not be an optimal sample for representing the state of the art, the results clearly show that there is a lot to improve regarding the clarity of instruction, the quality of exercises, the robustness of the diagnosis, the clarity and appropriateness of scoring, the diversity of feedback methods, the assumed benefit for various types of users as well as the usage of ASR. Despite various good approaches regarding graphics and game-like exercises there are obviously missing links between the pedagogical expertise in phonetic training on the one hand, and software development including usability engineering on the other.
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-native Speech
(2016)
Phonatory behavior of German speakers (GS) and French speakers (FS) in native (L1) and non-native (L2) speech was instrumentally examined. Vowel productions of the two groups were analyzed using a parametrization of phonatory behaviour and phonatory quality properties in the acoustic signal. The behavior of GS is characterized by more strained adduction of the vocal folds whereas FS show more incomplete glottal closure. Furthermore, GS change their phonatory behavior in the foreign language (=French) by adapting phonatory strategies of FS, whereas FS do not show this tendency. In addition, German beginners (BEG) and partly German advanced learners (ADV) are already orientated on production characteristics of the L2. French BEG however retain their phonatory behavior in L2 (=German) by showing less vocal fold adduction in comparison to their L1. French ADV show the opposite behavior. Finally, ADV of the two speaker groups generally show more strained behavior in L2 productions than BEG. The results provide evidence that GS and FS apply different laryngeal phonatory settings and that they altered their settings in L2 differently. Perceptual evaluation of voice quality of the speech material and a correlation analysis between acoustic and perceptual results are suggested for future research.
In Articulatory Phonology the jaw is not controlled individually but serves as an additional articulator to achieve the primary constriction. In this study the timing of jaw and tongue tip gestures for the coronal consonants /s, , t, d, n, l/ is analysed by means of EMMA. The findings suggest that the tasks of the jaw for the fricatives are to provide a second noise source and to stabilise the tongue position (more pronounced for /s/). For the voiceless stop, the speakers seem to aim at a high jaw position for producing a prominent burst. For /l/ a low jaw position is essential for avoiding lateral contact and for the apical articulation of this sound.
As can be shown for English data, the assimilation of the alveolar stop can result from an increased gestural overlap of the following oral closure gesture. Our experiment with German synthetic speech showed similar results. Further, it suggests that it is neccessary to complete the gestural specification of the glottal state. A voiced stop should be represented not only by an oral gesture, but by a glottal one as well.
Analyses of jaw movement(obtained by Electromagnetic Articulography) and acoustics show that loud speech is an intricate phenomenon. Besides involving higher intensity and subglottal pressure it affects jaw movements as well as fundamental frequency and especially first formants. It is argued that all these effects serve the purpose of enhancing perceptual salience.
The vowel quality in some diphthongs of Swabian (an upper german dialect) was determined by measurement of first and second formant values. A minimal contrast could be shown between two different diphthong qualities […], where for Standard German only one is assumed, viz. /ai/. The two diphthong qualities differ only slightly in onset and offset vowel quality, so a better understanding of their relationship was expected from an examination of their dynamic aspects. Our preliminary results suggest that there is indeed a difference in the temporal structure of the two diphthongs.
The aim of this paper is to highlight the actual need for corpora that have been annotated based on acoustic information. The acoustic information should be coded in features or properties and is needed to inform further processing systems, i.e. to present a basis for a speech recognition system using linguistic information. Feature annotation of existing corpora in combination with segmental annotation can provide a powerful training material for speech recognition systems, but will as well challenge the further processing of features to segments and syllables. We present here the theoretical preliminaries for our multilingual feature extraction system, that we are currently working on.
The goal of this study was to evaluate invariance vs. variability in both articulation and acoustics of speech production units. To keep interaction of controlled variables manageable, only a very simple subrange of speech productions was studied. Three different vowel qualities and six different consonants were examined in a VCV sequence embedded in an utterance. Beside coarticulation vocal effort was a further factor of perturbation occuring in natural speech. The set of consonants comprised various modes of articulation (stop, fricative, nasal, lateral) all produced at virtually the same place of articulation, viz. (post-) alveolar. The range of vowel environments /i:/, /e:/, /a:/ was selected for differences in height, in order to vary coarticulatory effects between the segments. Utterances were produced at two different volume levels, viz. normal and loud speech. Experiments by others have demonstrated that higher speech volume is not simply realized as a raised sound pressure level or as raised intensity. For loud speech a number of different correlates were observed, as raised subglottal pressure (see Ladefoged/McKinney 1963), raised fundamental frequency, raised first formant, and change of segmental durations (e.g. Traunmüller/Eriksson 2000). Furthermore an effect on jaw height was observed in vowels, which is that in vowel production in loud speech the jaw has a lower position. In earlier studies results have been presented for either articulatory (Schulman 1989) or acoustic changes (Traunmüller/Eriksson 2000) associated with higher volume. The present study examines effects of higher volume level on vowels as well as on consonants, in the articulatory as well as the acoustic channel. Data from six German speakers (5 male, 1 female) were recorded and analyzed. In the 266 articulatory channel jaw and tongue-tip movements were analyzed, in the acoustic domain segmental characteristics as formants, duration, intensity and fundamental frequency. The main results can be described as follows: - Jaw height in vowels depends on vowel height, in the vowel production of loud speech the jaw is lowered significantly. - Jaw height in consonants depends on the type of consonant (very high for /s/, / /, /t/, fairly low for /n/, /l/). Speaking at higher volume level does not have a significant effect on jaw height during (post-) alveoloar consonant production, coarticulatory effect of vowel context is mainly found with /n/ and /l/. - In loud speech jaw gestures have higher amplitude. - Acoustic segmental duration is changed: Vowels are lengthened and consonants are shortened. - Fundamental frequency in vowel segments is raised significantly. - In all vowels the first formant is raised. - The second formant of the non-front vowel /a:/ is raised. This work has demonstrated that jaw articulation in a number of alveolar consonants is remarkably precise and that motor equivalence only plays a minor role. Moreover, it has been shown that in the face of the generally larger variability of acoustic and articulatory parameters, the results are best considered in terms of perceptual invariants. The findings also substantiate the complexity of articulatory and acoustic reorganisation in loud speech.
Jaw and Order
(2007)
It is well-accepted that the jaw plays an active role in influencing vowel height. The general aim of the current study is to further investigate the extent to which the jaw is active in producing consonantal distinctions, with specific focus on coronal consonants. Therefore, tongue tip and jaw positions are compared for the German coronal consonants Is, J, t, d, n, 1/, that is, consonants having the same active articulators (apical/laminal) but differing in manner of articulation. In order to test the stability of articulatory positions for each of these coronal consonants, a natural perturbation paradigm was introduced by recording two levels of vocal effort: comfortable, and loud without shouting. Tongue and jaw movements of five speakers of German were recorded by means of EMMA during /aCa/ sequences. By analyzing the tongue tip and jaw positions and their spatial variability we found that (1) the jaw's contribution to these consonants varies with manner of articulation, and (2) for all coronal consonants the positions are stable across loudness conditions except for those of the nasal. Results are discussed with respect to the tasks of the jaw, and the possible articulatory adjustments that may accompany louder speech.
If more than one articulator is involved in the execution of a phonetic task, then the individual articulators have to be temporally coordinated with each other in a lawful manner. The present study aims at analyzing tongue-jaw cohesion in the temporal domain for the German coronal consonants /s, b, t, d, n, l/, i.e., consonants produced with the same set of articulators—the tongue blade and the jaw—but differing in manner of articulation. The stability of obtained interaction patterns is evaluated by varying the degree of vocal effort: comfortable and loud. Tongue and jaw movements of five speakers of German were recorded by means of electromagnetic midsagittal articulography _EMMA_ during /aCa/ sequences. The results indicate that _1_ tongue-jaw coordination varies with manner of articulation, i.e., a later onset and offset of the jaw target for the stops compared to the fricatives, the nasal and the lateral; (2) the obtained patterns are stable across vocal effort conditions; (3) the sibilants are produced with smaller standard deviations for latencies and target positions; and (4) adjustments to the lower jaw positions during the surrounding vowels in loud speech occur during the closing and opening movement intervals and not the consonantal target phases.
American English and German AI, AU observed in cognates such as Wein, wine, Haus, house are usually treated on a par, represented with the same initial vowel (cf. [ai], [au] for Am. Engl, and German [1]). Yet, acoustic measurements indicate differences as the relevant trajectories characteristically cross in Am. Engl, but not in German. These data may indicate consistency with the same initial target for these diphthongs in German, supporting the choice of the same Symbol /a/ in phonemic representation, as opposed to distinct targets (and distinct initial phonemes) in American English.
Die wortinitialen Segmente in Deutsch ja, jung sowie die Zweitkomponenten in den so genannten schließenden Diphthongen wie in Hai, Heu, Hau weisen im Vergleich zu hohen Vokalen in Kuh, Knie eine stark variierende Artikulation auf – zudem treten diese Laute in unterschiedlichen Kontexten auf. Die hier beobachtbaren Zusammenhänge zwischen Distribution und Aussprache lassen auf durch unterschiedliche silbische Positionen bedingte Allophonie schließen (Morciniec 1958; Shannon 1984; Hall 1992; für Englisch: Jakobson/Fant/Halle 1952, S. 20). Eine solche Analyse, die zudem eine erhebliche Reduktion des Phoneminventars beinhaltet, konnte sich bislang für das Deutsche nicht durchsetzen: Gewöhnlich sind sowohl die schließenden Diphthonge als auch [j] im deutschen Phoneminventar aufgeführt; letzteres Segment wird sogar meist als Frikativ klassifiziert. Der Sprachvergleich ergibt neue phonologische Generalisierungen, die eine durch Silbenstruktur bedingte allophonische Analyse stützen. Insbesondere lassen sich Abstufungen erkennen, die auf durch Sonorität bestimmte Silbifizierungsbedingungen schließen lassen.
Gaps in Word Formation
(1996)
The phonological word (henceforth pword) differs from lower units of the prosodic hierarchy (e.g. foot, syllable) in that its boundaries must align with morphological boundaries. While languages are claimed to differ w.r.t. the questions of whether and which word-internal constituents (e.g. stems, prefixes, suffixes, members of compounds) form a pword there is no consensus regarding the question of which diagnostics are relevant for determining pword structure. In this paper it is argued that systematic correlations between various suprasegmental properties (e.g. stress patterns, syllable structure) motivate the existence of word-internal pwords in German.
Evaluating phonological status: significance of paradigm uniformity vs. prosodic grouping effects
(2007)
A central concern of linguistic phonetics is to define criteria for determining the phonological status of sounds or sound properties observed in phonetic surface form. Based on acoustic measurements we show that the occurrence of syllabic sonorants vs. schwa-sonorant sequences in German is determined exclusively by segmental and prosodic structure, with no paradigm uniformity effects. We argue that these findings are consistent with a uniform representation of syllabic sonorants as schwa sonorant sequences in the lexicon. The stability of schwa in CVC-suffixes (e.g. the German diminutive suffix -chen), as opposed to its phonetic absence in a segmentally comparable underived context, is argued to be conditioned by the prosodic organisation of such suffixes external to the phonological word of the stem.
Trubetzkoy's recognition of a delimitative function of phonology, serving to signal boundaries between morphological units, is expressed in terms of alignment constraints in Optimality Theory, where the relevant constraints require specific morphological boundaries to coincide with phonological structure (Trubetzkoy 1936, 1939, McCarthy & Prince 1993). The approach pursued in the present article is to investigate the distribution of phonological boundary signals to gain insight into the criteria underlying morphological analysis. The evidence from English and Swedish suggests that necessary and sufficient conditions for word-internal morphological analysis concern the recognizability of head constituents, which include the rightmost members of compounds and head affixes. The claim is that the stability of word-internal boundary effects in historical perspective cannot in general be sufficiently explained in terms of memorization and imitation of phonological word form. Rather, these effects indicate a morphological parsing mechanism based on the recognition of word-internal head constituents. Head affixes can be shown to contrast systematically with modifying affixes with respect to syntactic function, semantic content, and prosodic properties. That is, head affixes, which cannot be omitted, often lack inherent meaning and have relatively unmarked boundaries, which can be obscured entirely under specific phonological conditions. By contrast, modifying affixes, which can be omitted, consistently have inherent meaning and have stronger boundaries, which resist prosodic fusion in all phonological contexts. While these correlations are hardly specific to English and Swedish it remains to be investigated to which extent they hold cross-linguistically. The observation that some of the constituents identified on the basis of prosodic evidence lack inherent meaning raises the issue of compositionality. I will argue that certain systematic aspects of word meaning cannot be captured with reference to the syntagmatic level, but require reference to the paradigmatic level instead. The assumption is then that there are two dimensions of morphological analysis: syntagmatic analysis, which centers on the criteria for decomposing words in terms of labelled constituents, and paradigmatic analysis, which centers on the criteria for establishing relations among (whole) words in the mental lexicon. While meaning is intrinsically connected with paradigmatic analysis (e.g. base relations, oppositeness) it is not essential to syntagmatic analysis.
Der Begriff Wortprosodie bezeichnet hier die Organisation von Segmenten in die hierarchisch geordneten Konstituenten Silbe, Fuß und phonologisches Wort. Evidenz für solch eine Organisation und die ihr zugrundeliegenden Regeln findet sich in gewissen distributioneilen sowie phonetischen Besonderheiten von Segmenten. In diesem Beitrag versuche ich eine Darstellung der wesentlichen Züge der deutschen Wortprosodie als Interaktion miteinander in Konflikt stehender Beschränkungen im Sinne der Optimalitätstheorie. Im Mittelpunkt steht die Herausarbeitung unmarkierter prosodischer Strukturen auf der phonologisch-lexikalischen Ebene, da unmarkierte Strukturen einen wichtigen Bezugspunkt für die Beurteilung von Varianten bilden. Zugleich ergibt sich eine neue Perspektive auf das Verhältnis von Norm und Regel.
One was a distinguished natural scientist and engineer, the other a self-taught scientist and vilified as a conman: Christian Gottlieb Kratzenstein (1723–1795) and Wolfgang von Kempelen (1734–1804). Some of the former’s postula-tions on human physiology and articulation of speech proved wrong in later years. Most of the latter’s theories are considered applicable even today. The perhaps most contrasting approaches to speech synthesis during the 18th century are linked to their names. There are many essential differences between their approaches which show that these two researchers were not only representatives of different schools of thought, but also representatives of two different scientific eras. A speculative and philosophical approach on the one hand versus an empirical and logical approach on the other hand. Both Kratzenstein and Kempelen published books on their research. But while the “Tentamen” [4] of the physician Kratzen-stein remains rather vague and imprecise in its descriptions of vowel production and synthesis, the “Mechanismus” [8] of the engineer Kempelen shows much more precision and correctness in almost every respect of human speech and lan-guage. The goal of this paper is to discuss the differences between these two con-temporaneous researchers on speech synthesis and to compare their theories with present-days findings.