Refine
Year of publication
- 2017 (98) (remove)
Document Type
- Conference Proceeding (41)
- Part of a Book (27)
- Article (25)
- Book (2)
- Working Paper (2)
- Other (1)
Language
- English (98) (remove)
Keywords
- Korpus <Linguistik> (37)
- Deutsch (20)
- Corpus linguistics (11)
- Computerlinguistik (9)
- Annotation (7)
- Corpus technology (6)
- Internet (6)
- Sprachstatistik (6)
- Texttechnologie (6)
- Englisch (5)
Publicationstate
- Veröffentlichungsversion (67)
- Postprint (14)
- Zweitveröffentlichung (8)
- Preprint (1)
Reviewstate
- Peer-Review (69)
- Peer-review (7)
- (Verlags)-Lektorat (6)
- Peer-Revied (2)
Publisher
This study investigates the interrelations between bilingual development (German/Russian), immigration and integration in the host society. Participants are Russian-Germans, that is, ethnic Germans who have repatriated to Germany from the former Soviet Union. They were part of a longitudinal study dedicated to the integration of multi-generation Russian-German families in Germany. The paper focuses on eight Russian-Germans who moved to Germany between the ages of five and eight and are now young adults. The analysis is based on interviews conducted in the twentieth year of their life in Germany in German and Russian, A semi-structured questionnaire was used to elicit information on the main stages of integration, the use of the languages, the attitudes towards German and Russian, and an assessment of the current situation. The obtained data were used to make an initial assessment of the oral language competencies of the participants and as sources of information about the objective facts and subjective attitudes that determined linguistic and social integration.
Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20–44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a ‘wide’ yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
Recently, a claim was made, on the basis of the German Google Books 1-gram corpus (Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books. Science 2010; 331: 176–82), that there was a linear relationship between six non-technical non-Nazi words and three ‘explicitly Nazi words’ in times of World War II (Caruana-Galizia. 2015. Politics and the German language: Testing Orwell’s hypothesis using the Google N-Gram corpus. Digital Scholarship in the Humanities [Online]. http://dsh.oxfordjournals.org/cgi/doi/10.1093/llc/fqv011 (accessed 15 April 2015)). Here, I try to show that apparent relationships like this are the result of misspecified models that do not take into account the temporal aspect of time-series data. The main point of this article is to demonstrate why such analyses run the risk of incorrect statistical inference, where potential effects are both meaningless and can potentially lead to wrong conclusions.
When appearance does not match accent: neural correlates of ethnicity-related expectancy violations
(2017)
Most research on ethnicity in neuroscience and social psychology has focused on visual cues. However, accents are central social markers of ethnicity and strongly influence evaluations of others. Here, we examine how varying auditory (vocal accent) and visual (facial appearance) information about others affects neural correlates of ethnicity-related expectancy violations. Participants listened to standard German and Turkish-accented speakers and were subsequently presented with faces whose ethnic appearance was either congruent or incongruent to these voices. We expected that incongruent targets (e.g. German accent/Turkish face) would be paralleled by a more negative N2 event-related brain potential (ERP) component. Results confirmed this, suggesting that incongruence was related to more effortful processing of both Turkish and German target faces. These targets were also subjectively judged as surprising. Additionally, varying lateralization of ERP responses for Turkish and German faces suggests that the underlying neural generators differ, potentially reflecting different emotional reactions to these targets. Behavioral responses showed an effect of violated expectations: German-accented Turkish-looking targets were evaluated as most competent of all targets. We suggest that bringing together neural and behavioral measures of expectancy violations, and using both visual and auditory information, yields a more complete picture of the processes underlying impression formation.
We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.
Complex linguistic phenomena, such as Clitic Climbing in Bosnian, Croatian and Serbian, are often described intuitively, only from the perspective of the main tendency. In this paper, we argue that web corpora currently offer the best source of empirical material for studying Clitic Climbing in BCS. They thus allow the most accurate description of this phenomenon, as less frequent constructions can be tracked only in big, well-annotated data sources. We compare the properties of web corpora for BCS with traditional sources and give examples of studies on CC based on web corpora. Furthermore, we discuss problems related to web corpora and suggest some improvements for the future.