Refine
Year of publication
Document Type
- Conference Proceeding (387)
- Part of a Book (268)
- Article (204)
- Book (37)
- Working Paper (17)
- Doctoral Thesis (16)
- Other (14)
- Preprint (7)
- Part of Periodical (5)
- Report (2)
Language
- English (961) (remove)
Keywords
- Korpus <Linguistik> (279)
- Deutsch (209)
- Computerlinguistik (98)
- Annotation (70)
- Interaktion (65)
- Konversationsanalyse (60)
- Automatische Sprachanalyse (54)
- Gesprochene Sprache (53)
- Englisch (52)
- Wörterbuch (42)
Publicationstate
- Veröffentlichungsversion (961) (remove)
Reviewstate
Publisher
- IDS-Verlag (80)
- de Gruyter (55)
- Association for Computational Linguistics (44)
- European Language Resources Association (ELRA) (43)
- European Language Resources Association (24)
- Institut für Deutsche Sprache (24)
- Springer (19)
- Linköping University Electronic Press (15)
- The Association for Computational Linguistics (15)
- Leibniz-Institut für Deutsche Sprache (IDS) (14)
Physicists look at language
(2006)
This paper aims at verifying if the most important online Brazilian Portuguese dictionaries include some of the neologisms identified in texts published in the 1990s to 2000s, formed with the elements ciber-, e-, bio-, eco- and narco, which we refer to as fractomorphemes / fracto-morphèmes. Three online dictionaries were analyzed (Aulete, Houaiss and Michaelis), as well as Vocabulário Ortográfico da Língua Portuguesa (VOLP). We were able to conclude that all three dictionaries and VOLP include neologisms with these elements; Michaelis and VOLP do not include separate entries for bound morphemes, whereas Houaiss includes entries for all of them and Aulete includes entries for bio-, eco- and narco-. Aulete also describes the neological meaning of eco- and narco-, whereas Houaiss does not.
This White Paper sets out commonly agreed definitions on activities of consortia within NFDI. It aims to provide a common basis for reporting and reference regarding selected questions of cross-consortial relevance in DFG’s template for the Interim Reports. The questions were prioritised by an NFDI Task Force on Evaluation and Reporting (formerly Task Force Monitoring) as a result of discussing possible answers to the DFG template. In this process the need to agree on a generalizable meaning of terms commonly used in the context of NFDI, and reporting in particular, were identified from cross-consortial perspectives. Questions that showed the highest requirement on clarification are discussed in this White Paper. As NFDI evolves, the Task Force will likely propose further joint approaches for reporting in information infrastructures.
While each of broad relevance, the questions addressed relate to substantially different aspects of consortia’s work. They are thus also structured slightly different.
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.
This study investigates cross-language differences in pitch range and variation in four languages from two language groups: English and German (Germanic) and Bulgarian and Polish (Slavic). The analysis is based on large multi-speaker corpora (48 speakers for Polish, 60 for each of the other three languages). Linear mixed models were computed that include various distributional measures of pitch level, span and variation, revealing characteristic differences across languages and between language groups. A classification experiment based on the relevant parameter measures (span, kurtosis and skewness values for pitch distributions for each speaker) succeeded in separating the language groups.
This study presents the results of a large-scale comparison of various measures of pitch range and pitch variation in two Slavic (Bulgarian and Polish) and two Germanic (German and British English) languages. The productions of twenty-two speakers per language (eleven male and eleven female) in two different tasks (read passages and number sets) are compared. Significant differences between the language groups are found: German and English speakers use lower pitch maxima, narrower pitch span, and generally less variable pitch than Bulgarian and Polish speakers. These findings support the hypothesis that inguistic communities tend to be characterized by particular pitch profiles.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
New KARL (Knowledge Acquisition and Representation Language) allows to specify all parts of a problem-solving method (PSM). It is a formal language with a well-defined semantics and thus allows to represent PSMs precisely and unambiguously yet abstracting from implementation detail. In this paper it is shown how the language KARL has been modified and extended to New KARL to better meet the needs for the representation of PSMs. Based on a conceptual structure of PSMs new language primitives are introduced for KARL to specify such a conceptual structure and to support the configuration of methods. An important goal for this extension was to preserve three important properties of KARL: to be (i) a conceptual, (ii) a formal, and (iii) an executable language.