Refine
Year of publication
Document Type
- Conference Proceeding (84)
- Part of a Book (66)
- Article (48)
- Book (4)
- Doctoral Thesis (4)
- Preprint (2)
- Working Paper (1)
Language
- English (209) (remove)
Has Fulltext
- yes (209)
Keywords
- Deutsch (209) (remove)
Publicationstate
- Veröffentlichungsversion (209) (remove)
Reviewstate
- Peer-Review (126)
- (Verlags)-Lektorat (51)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (4)
- Review-Status-unbekannt (4)
- (Verlags-)Lektorat (1)
- Peer-Revied (1)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
- IDS-Verlag (19)
- European Language Resources Association (ELRA) (11)
- de Gruyter (11)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (6)
- International Speech Communication Association (6)
- Association for Computational Linguistics (5)
- Lexical Computing CZ s.r.o. (5)
- European Language Resources Association (4)
- The Association for Computational Linguistics (4)
- University of Birmingham (4)
The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.
The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
In conversation, interlocutors rarely leave long gaps between turns, suggesting that next speakers begin to plan their turns while listening to the previous speaker. The present experiment used analyses of speech onset latencies and eye-movements in a task-oriented dialogue paradigm to investigate when speakers start planning their responses. German speakers heard a confederate describe sets of objects in utterances that either ended in a noun [e.g., Ich habe eine Tür und ein Fahrrad (“I have a door and a bicycle”)] or a verb form [e.g., Ich habe eine Tür und ein Fahrrad besorgt (“I have gotten a door and a bicycle”)], while the presence or absence of the final verb either was or was not predictable from the preceding sentence structure. In response, participants had to name any unnamed objects they could see in their own displays with utterances such as Ich habe ein Ei (“I have an egg”). The results show that speakers begin to plan their turns as soon as sufficient information is available to do so, irrespective of further incoming words.
Comprehending conditional statements is fundamental for hypothetical reasoning about situations. However, the online comprehension of conditional statements containing different conditional connectives is still debated. We report two self-paced reading experiments on German conditionals presenting the conditional connectives wenn (‘if’) and nur wenn (‘only if’) in identical discourse contexts. In Experiment 1, participants read a conditional sentence followed by the confirmed antecedent p and the confirmed or negated consequent q. The final, critical sentence was presented word by word and contained a positive or negative quantifier (ein/kein ‘one/no’). Reading times of the two quantifiers did not differ between the two conditional connectives. In Experiment 2, presenting a negated antecedent, reading times for the critical positive quantifier (ein) did not differ between conditional connectives, while reading times for the negative quantifier (kein) were shorter for nur wenn than for wenn. The results show that comprehenders form distinct predictions about discourse continuations due to differences in the lexical semantics of the tested conditional connectives, shedding light on the role of conditional connectives in the online interpretation of conditionals in general.
We present a method for detecting and reconstructing separated particle verbs in a corpus of spoken German by following an approach suggested for written language. Our study shows that the method can be applied successfully to spoken language, compares different ways of dealing with structures that are specific to spoken language corpora, analyses some remaining problems, and discusses ways of optimising precision or recall for the method. The outlook sketches some possibilities for further work in related areas.