Refine
Year of publication
- 2017 (114) (remove)
Document Type
- Article (65)
- Conference Proceeding (36)
- Part of a Book (11)
- Book (2)
Has Fulltext
- yes (114)
Keywords
- Deutsch (34)
- Korpus <Linguistik> (28)
- Corpus linguistics (11)
- Rezension (10)
- Computerlinguistik (8)
- Gesprochene Sprache (7)
- Corpus technology (6)
- Konversationsanalyse (6)
- Datenmanagement (5)
- Englisch (5)
Publicationstate
- Veröffentlichungsversion (84)
- Zweitveröffentlichung (12)
- Postprint (10)
Reviewstate
- Peer-Review (114) (remove)
Publisher
In this paper, we discuss to what extent the German-based contact language Unserdeutsch (Rabaul Creole German, cf. Volker 1982) matches the category‘creole language’ from both a socio-historical and structural perspective. As a point of reference, we will use typological criteria that are widely supposed to be typical for creole languages. It is shown that Unserdeutsch fits fairly well into the pattern of an ‘average creole’, as has been suggested by data in the Atlas of Pidgin and Creole Language Structures (Michaelis et al. 2013). This is despite a series of atypical conditions in its development that might lead us to expect a close structural proximity to the lexifier language, i.e. a relatively acrolectal creole. A possible explanation for this striking discrepancy can be found in the primary function of Unserdeutsch as a marker of identity as well as in the linguistic structure of its substrate language Tok Pisin.
This paper provides insights into the ongoing international research project Unserdeutsch (Rabaul Creole German): Documentation of a highly endangered creole language in Papua New Guinea, based at the University of Augsburg, Germany. It elaborates on the different stages of the project, ranging from fieldwork to corpus development, thereby outlining the methods and software background used for the intended purposes. In doing so, we also give some approaches to solving specific problems, which have arisen in the course of practical work until now.
Basic grammatical categories may carry social meanings irrespective of their semantic content. In a set of four studies, we demonstrate that verbs—a basic linguistic category present and distinguishable in most languages—are related to the perception of agency, a fundamental dimension of social perception. In an archival analysis of actual language use in Polish and German, we found that targets stereotypically associated with high agency (men and young people) are presented in the immediate neighborhood of a verb more often than non-agentic social targets (women and older people). Moreover, in three experiments using a pseudo-word paradigm, verbs (but not adjectives and nouns) were consistently associated with agency (but not with communion). These results provide consistent evidence that verbs, as grammatical vehicles of action, are linguistic markers of agency. In demonstrating meta-semantic effects of language, these studies corroborate the view of language as a social tool and an integral part of social perception.
This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most of these students are rather not familiar with computer science or, in the case of IIM students, linguistics, we see them as comparable with students of the humanities. We follow a blended learning strategy with lectures, online materials, tutorials, and screencasts. In the first two terms, we focus on linguistics and its formalisation, NLP tools and applications are then introduced from the third term on. The lectures are combined with tutorials and - since the summer term 2017 - with a set of screencasts.
In this paper we present work in developing a computerized grammar for the Latin language. It demonstrates the principles and challenges in developing a grammar for a natural language in a modern grammar formalism. The grammar presented here provides a useful resource for natural language processing applications in different fields. It can be easily adopted for language learning and use in language technology for Cultural Heritage like translation applications or to support post-correction of document digitization.
We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.
Data sets of publication meta data with manually disambiguated author names play an important role in current author name disambiguation (AND) research. We review the most important data sets used so far, and compare their respective advantages and shortcomings. From the results of this review, we derive a set of general requirements to future AND data sets. These include both trivial requirements, like absence of errors and preservation of author order, and more substantial ones, like full disambiguation and adequate representation of publications with a small number of authors and highly variable author names. On the basis of these requirements, we create and make publicly available a new AND data set, SCAD-zbMATH. Both the quantitative analysis of this data set and the results of our initial AND experiments with a naive baseline algorithm show the SCAD-zbMATH data set to be considerably different from existing ones. We consider it a useful new resource that will challenge the state of the art in AND and benefit the AND research community.
In conversation, turn-taking is usually fluid, with next speakers taking their turn right after the end of the previous turn. Most, but not all, previous studies show that next speakers start to plan their turn early, if possible already during the incoming turn. The present study makes use of the list-completion paradigm (Barthel et al., 2016), analyzing speech onset latencies and eye-movements of participants in a task-oriented dialogue with a confederate. The measures are used to disentangle the contributions to the timing of turn-taking of early planning of content on the one hand and initiation of articulation as a reaction to the upcoming turn-end on the other hand. Participants named objects visible on their computer screen in response to utterances that did, or did not, contain lexical and prosodic cues to the end of the incoming turn. In the presence of an early lexical cue, participants showed earlier gaze shifts toward the target objects and responded faster than in its absence, whereas the presence of a late intonational cue only led to faster response times and did not affect the timing of participants' eye movements. The results show that with a combination of eye-movement and turn-transition time measures it is possible to tease apart the effects of early planning and response initiation on turn timing. They are consistent with models of turn-taking that assume that next speakers (a) start planning their response as soon as the incoming turn's message can be understood and (b) monitor the incoming turn for cues to turn-completion so as to initiate their response when turn-transition becomes relevant.
Genau tritt im aktuellen Sprachgebrauch nicht nur in seiner klassischen Bedeutung als Adjektiv oder Adverb auf, sondern wird auch als Fokus- bzw. Gradpartikel sowie Gesprächspartikel verwendet. Bisherige Beschreibungen haben sich nur in geringem Maße und unter Verwendung heterogener Begriffe mit seinem interaktionalen Gebrauch auseinandergesetzt. In diesem Beitrag werden mit Hilfe eines sequenziellen und multimodalen Ansatzes verschiedene interaktionale Verwendungen von genau in Videoaufnahmen deutscher Alltagsgespräche untersucht. Ausgehend von seiner Funktion als Gradpartikel wird genau sowohl als redebeitragsinterne Bestätigungspartikel in Wortfindungsprozessen als auch als responsive Bestätigungspartikel eingesetzt. Da genau häufig das Ende eines Verstehensprozesses bzw. einer Wissensverhandlung markiert, könnte allgemeiner die Bezeichnung des Intersubjektivitätsmarkers in Erwägung gezogen werden. Aus dem responsiven, bestätigenden Gebrauch heraus entsteht eine stärker sequenzschließende und sequenzstrukturierende Funktion von genau, woraus sich auch der zunehmende Gebrauch dieses Lexems als rein diskursstrukturierende Partikel innerhalb eines Redezugs erklären könnte.
Für die sprachbasierte Forschung in den Geistes- und Sozialwissenschaften stellt CLARIN eine Forschungsinfrastruktur bereit, die auf die hochgradig heterogenen Forschungsdaten in diesen Wissenschaftsbereichen angepasst ist. Mit Werkzeugen zum Auffinden, zur standardkonformen Aufbereitung und zur nachhaltigen Aufbewahrung von Daten sowie mit der Bereitstellung von virtuellen Forschungsumgebungen zur kollaborativen Erstellung und Auswertung von Forschungsdaten unterstützt CLARIN alle wesentlichen Aspekte des Datenmanagements und der Datenarchivierung. Diese CLARIN-Angebote werden durch Beratungs- und Schulungsmaßnahmen begleitet.