Refine
Year of publication
Document Type
- Part of a Book (145)
- Article (129)
- Conference Proceeding (118)
- Book (6)
- Working Paper (5)
- Doctoral Thesis (4)
- Preprint (3)
Language
- English (410) (remove)
Keywords
- Deutsch (410) (remove)
Publicationstate
- Veröffentlichungsversion (209)
- Postprint (66)
- Zweitveröffentlichung (65)
- Ahead of Print (2)
- Preprint (1)
Reviewstate
- Peer-Review (180)
- (Verlags)-Lektorat (90)
- Peer-review (6)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (4)
- Review-Status-unbekannt (4)
- Peer-Revied (3)
- Verlags-Lektorat (2)
- (Verlags-)Lektorat (1)
- (Verlags-)lektorat (1)
Publisher
- Benjamins (27)
- de Gruyter (25)
- IDS-Verlag (19)
- European Language Resources Association (ELRA) (16)
- Niemeyer (12)
- Springer (12)
- Elsevier (7)
- Oxford University Press (7)
- European Language Resources Association (6)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (6)
The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.
Language attitudes matter; they influence people’s behaviour and decisions. Therefore, it is crucial to learn more about patterns in the way that languages are evaluated. One means of doing so is using a quantitative approach with data representative of a whole population, so that results mirror dispositions at a societal level. This kind of approach is adopted here, with a focus on the situation in Germany. The article consists of two parts. First, I will present some results of a new representative survey on language attitudes in Germany (the Germany Survey 2017). Second, I will show how language attitudes penetrate even seemingly objective data collection processes by examining the German Microcensus. In 2017, for the first time in eighty years, the German Microcensus included a question on language use ‘at home’. Unfortunately, however, the question was clearly tainted by language attitudes instead of being objective. As a result, the Microcensus significantly misrepresents the linguistic reality of different migrant languages spoken in Germany.
In this paper we examine the composition and interactional deployment of suspended assessments in ordinary German conversation. We define suspended assessments as lexicosyntactically incomplete assessing TCUs that share a distinct cluster of prosodic-phonetic features which auditorily makes them come off as 'left hanging' rather than cut-off (e.g., Schegloff/Jefferson/Sacks 1977; Jasperson 2002) or trailing-off (e.g., Local/Kelly 1986; Walker 2012). Using CA/IL methodology (Couper-Kuhlen/Selting 2018) and drawing on a large body of video-recorded face-to-face conversations, we highlight the verbal, vocal and bodily-visual resources participants use to render such unfinished assessing TCUs recognizably incomplete and identify six recurrent usage types. Overall, the suspension of assessing TCUs appears to either serve as a practice for circumventing the production of assessments that are interactionally inapposite, or as a practice for coping with local contingencies that render the very doing of an assessment problematic for the speaker. Data are in German with English translations.
The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
A trainable prosodic model called SFC (Superposition of Functional Contours), proposed by Holm and Bailly, is here confronted to German intonation. Training material is the publicly available Siemens Synthesis Corpus that provides spoken utterances for high-quality speech synthesis. We describe the labeling framework and first evaluation results that compares the original prosody of test sentences of this corpus with their prosodic rendering by the proposed model and state-of-the-art systems available on-line on the web.