Refine
Document Type
- Article (4) (remove)
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Keywords
- Sprachproduktion (2)
- speech production (2)
- 1/3 power law (1)
- Aerodynamik (1)
- Artikulation (1)
- Bayesian inference (1)
- Hierarchical modeling (1)
- Individual differences (1)
- Korpus <Linguistik> (1)
- Labial (1)
Publicationstate
- Veröffentlichungsversion (4) (remove)
Reviewstate
- Peer-Review (4)
Publisher
- Elsevier (1)
- Springer (1)
- Springer Nature (1)
Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20–44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a ‘wide’ yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
Multinomial processing tree (MPT) models are a class of measurement models that account for categorical data by assuming a finite number of underlying cognitive processes. Traditionally, data are aggregated across participants and analyzed under the assumption of independently and identically distributed observations. Hierarchical Bayesian extensions of MPT models explicitly account for participant heterogeneity by assuming that the individual parameters follow a continuous hierarchical distribution.We provide an accessible introduction to hierarchical MPT modeling and present the user-friendly and comprehensive R package TreeBUGS, which implements the two most important hierarchical MPT approaches for participant heterogeneity—the beta-MPT approach (Smith & Batchelder, Journal of Mathematical Psychology 54:167-183, 2010) and the latent-trait MPT approach (Klauer, Psychometrika 75:70-98, 2010). TreeBUGS reads standard MPT model files and obtains Markov-chain Monte Carlo samples that approximate the posterior distribution. The functionality and output are tailored to the specific needs of MPT modelers and provide tests for the homogeneity of items and participants, individual and group parameter estimates, fit statistics, and within- and between-subjects comparisons, as well as goodness-of-fit and summary plots. We also propose and implement novel statistical extensions to include continuous and discrete predictors (as either fixed or random effects) in the latent-trait MPT model.
The relation between speed and curvature provides a characterization of the spatio-temporal orchestration of kinematic movements. For hand movements, this relation has been reported to follow a power law with exponent -1/3. The same power law has been claimed to govern articulatory movements. We studied the functional form of speed as predicted by curvature using electromagnetic articulography, focusing on three sensors: the tongue tip, the tongue body, and the lower lip. Of specific interest to us was the question of whether the speed-curvature relation is modified by articulatory practice, gauged with words’ frequencies of occurrence. Although analyses imposing linearity a priori indeed supported a power law, relaxation of this linearity assumption revealed that the effect of curvature on speed levels off substantially for lower values of curvature. A modification of the power law is proposed that takes this curvature into account. Furthermore, controlling statistically for number of phones and word duration, we observed that the speed-curvature function was further modulated by an interaction of lexical frequency by curvature, such that for increasing frequency, speed decreased slightly for low curvatures while it increased slightly for high curvatures. The modulation of the balance between speed and curvature by lexical frequency provides further evidence that the skill of articulation improves with practice on a word-to-word basis, and challenges theories of speech production.
This report presents a corpus of articulations recorded with Schlieren photography, a recording technique to visualize aeroflow dynamics for two purposes. First, as a means to investigate aerodynamic processes during speech production without any obstruction of the lips and the nose. Second, to provide material for lecturers of phonetics to illustrates these aerodynamic processes. Speech production was recorded with 10 kHz frame rate for statistical video analyses. Downsampled videos (500 Hz) were uplodad to a youtube channel for illustrative purposes. Preliminary analyses demonstrate potential in applying Schlieren photography in research.