Refine
Year of publication
- 2020 (169) (remove)
Document Type
- Article (91)
- Part of a Book (23)
- Conference Proceeding (20)
- Book (10)
- Other (9)
- Review (7)
- Part of Periodical (4)
- Doctoral Thesis (2)
- Master's Thesis (1)
- Report (1)
Language
- German (110)
- English (57)
- Multiple languages (2)
Keywords
- COVID-19 (41)
- Korpus <Linguistik> (36)
- Deutsch (30)
- Neologismus (25)
- Sprachgebrauch (25)
- Forschungsdaten (16)
- Computerlinguistik (14)
- Gesprochene Sprache (13)
- Wortschatz (13)
- Lexikostatistik (12)
Publicationstate
- Veröffentlichungsversion (169) (remove)
Reviewstate
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (66)
- CLARIN (6)
- Heidelberg University Publishing (6)
- Erich Schmidt (5)
- European Language Resources Association (5)
- Spektrum der Wissenschaft Verlagsgesellschaft (5)
- Association for Computational Linguistics (4)
- Verlag für Gesprächsforschung (4)
- de Gruyter (4)
- Linköping University Electronic Press (3)
This study examines asymmetries between so-called inherent and contextual categories in relation to the morphological complexity of the nominal and verbal inflectional domain of languages. The observations are traced back to the influence of adult L2 learning in scenarios of intense language contact. A method for a simple comparison of the amount of inherent versus contextual categories is proposed and applied to the German-based creole language Unserdeutsch (Rabaul Creole German) in comparison to its lexifier language. The same procedure will be applied to two further language pairs. The grammatical systems of Unserdeutsch and other contact languages display a noticeable asymmetry regarding their structural complexity. Analysing different kinds of evidence, the explanatory key factor seems to be the role of (adult) L2 acquisition in the history of a language, whereby languages with periods of widespread L2 acquisition tend to lose contextual features. This impression is reinforced by general tendencies in pidgin and creole languages. Beyond that, there seems to be a tendency for inherent categories to be more strongly associated with the verb, while contextual categories seem to be more strongly associated with the noun. This leads to an asymmetry in categorical complexity between the noun phrase and the verb phrase in languages that experienced periods of intense L2 learning.
This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.
The 12th Web as Corpus workshop (WAC-XII) looks at the past, present, and future of web corpora given the fact that large web corpora are nowadays provided mostly by a few major initiatives and companies, and the diversity of the early years appears to have faded slightly. Also, we acknowledge the fact that alternative sources of data (such as data from Twitter and similar platforms) have emerged, some of them only available to large companies and their affiliates, such as linguistic data from social media and other forms of the deep web. At the same time, gathering interesting and relevant web data (web crawling) is becoming an ever more intricate task as the nature of the data offered on the web changes (for example the death of forums in favour of more closed platforms).
In this article, we examine the current situation of data dissemination and provision for CMC corpora. By that we aim to give a guiding grid for future projects that will improve the transparency and replicability of research results as well as the reusability of the created resources. Based on the FAIR guiding principles for research data management, we evaluate the 20 European CMC corpora listed in the CLARIN CMC Resource family, individuate successful strategies among the existing corpora and establish best practices for future projects. We give an overview of existing approaches to data referencing, dissemination and provision in European CMC corpora, and discuss the methods, formats and strategies used. Furthermore, we discuss the need for community standards and offer recommendations for best practices when creating a new CMC corpus.
In this Paper, we describe a schema and models which have been developed for the representation of corpora of computer-mediated communicatin (CMC corpora) using the representation framework provided by the Text Encoding Initiative (TEI). We characterise CMC discourse as dialogic, sequentially organised interchange between humans and point out that many features of CMC are not adequately handled by current corpus encoding schemas and tools. We formulate desiderata for a representation of CMC in encoding schemes and argue why the TEI is a suitable framework for the encoding of CMC corpora. We propose a model of basic CMC units (utterances, posts, and nonverbal activities) and the macro- and micro-level structures of interactions in CMC environments. Based on these models, we introduce CMC-core, a TEI customisation for the encoding of CMC corpora, which defines CMC-specific encoding features on the four levels of elements, model classes, attribute classes, and modules of the TEI infrastructure. The description of our customisation is illustrated by encoding examples from corpora by researchers of the TEI SIG CMC, representing a variety of CMC genres, i.e. chat, wiki talk, twitter, blog, and Second Life interactions. The material described, i.e. schemata, encoding examples, and documentation, is available from the of the TEI CMC SIG Wiki and will accompany a feature request to the TEI council in late 2019.
We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different customized contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76) and free indirect (F1=0.59) STWR. For reported STWR, the comparison was inconclusive, but BERT gave the best average results and best individual model (F1=0.60). Our best recognizers, our customized language embeddings and most of our test and training data are freely available and can be found via www.redewiedergabe.de or at github.com/redewiedergabe.
Die vorgestellte Studie untersucht die Anteile unterschiedlicher Redewiedergabeformen im Vergleich zwischen zwei Literaturtypen von gegensätzlichen Enden des Spektrums: Hochliteratur – definiert als Werke, die auf der Auswahlliste von Literaturpreisen standen – und Heftromanen, massenproduzierten Erzählwerken, die zumeist über den Zeitschriftenhandel vertrieben werden und früher abwertend als „Romane der Unterschicht” (Nusser 1981) bezeichnet wurden. Unsere These ist, dass sich diese Literaturtypen hinsichtlich ihrer Erzählweise unterscheiden, und sich dies in den verwendeten Wiedergabeformen niederschlägt. Der Fokus der Untersuchung liegt auf der Dichotomie zwischen direkter und nicht-direkter Wiedergabe, die schon in der klassischen Rhetorik aufgemacht wurde.
Individuals with Autism Spectrum Disorder (ASD) experience a variety of symptoms sometimes including atypicalities in language use. The study explored diferences in semantic network organisation of adults with ASD without intellectual impairment. We assessed clusters and switches in verbal fuency tasks (‘animals’, ‘human feature’, ‘verbs’, ‘r-words’) via curve ftting in combination with corpus-driven analysis of semantic relatedness and evaluated socio-emotional and motor action related content. Compared to participants without ASD (n=39), participants with ASD (n=32) tended to produce smaller clusters, longer switches, and fewer words in semantic conditions (no p values survived Bonferroni-correction), whereas relatedness and content were similar. In ASD, semantic networks underlying cluster formation appeared comparably small without afecting strength of associations or content.
In diesem Beitrag stellen wir die Ergebnisse einer Studie über die Intonation von Frageaktivitäten in deutschen Alltagsgesprächen vor. Unsere Untersuchung erforscht, inwieweit die Intonation zur Kontextualisierung von konversationellen Fragen beiträgt. In der Analyse stützen wir uns auf das autosegmental-metrische Modell von Peters und das taxonomische Modell der interaktionalen Prosodieforschung von Selting. Diese Modelle beschreiben jeweils phonologische oder pragmatische Aspekte der Frageintonation, zwei Dimensionen, die für sich genommen, keine vollständige Beschreibung liefern können. Auf der Grundlage authentischer Gesprächsdaten aus dem Korpus FOLK argumentieren wir für die Kompatibilität des autosegmental-metrischen Modells von Peters und des taxonomischen Modells der Frageintonation von Selting. Die Merkmale aus beiden Modellen lassen sich zu Bündeln kombinieren, die es erlauben, die Intonation von Fragen zu erfassen.
Preface
(2020)
Content
1 Substituto - A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing
Marianne Grace Araneta, Gülsen Eryigit, Alexander König, Ji-Ung Lee, Ana Luís, Verena Lyding, Lionel Nicolas, Christos Rodosthenous and Federico Sangati
2 The Teacher-Student Chatroom Corpus
Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne and Paula Buttery
3 Polygloss - A conversational agent for language practice
Etiene da Cruz Dalcol and Massimo Poesio
4 Show, Don’t Tell: Visualising Finnish Word Formation in a Browser-Based Reading Assistant
Frankie Robertson
In this paper we investigate the problem of grammar inference from a different perspective. The common approach is to try to infer a grammar directly from example sentences, which either requires a large training set or suffers from bad accuracy. We instead view it as a problem of grammar restriction or sub-grammar extraction. We start from a large-scale resource grammar and a small number of examples, and find a sub-grammar that still covers all the examples. To do this we formulate the problem as a constraint satisfaction problem, and use an existing constraint solver to find the optimal grammar. We have made experiments with English, Finnish, German, Swedish and Spanish, which show that 10–20 examples are often sufficient to learn an interesting domain grammar. Possible applications include computer-assisted language learning, domain-specific dialogue systems, computer games, Q/A-systems, and others.
This thesis describes work in three areas: grammar engineering, computer-assisted language learning and grammar learning. These three parts are connected by the concept of a grammar-based language learning application. Two types of grammars are of concern. The first we call resource grammars, extensive descriptions a natural languages. Part I focuses on this kind of grammars. The other are domain-specific or application-specific grammars. These grammars only describe a fragment of natural language that is determined by the domain of a certain application. Domain-specific grammars are relevant for Part II and Part III. Another important distinction is between humans learning a new natural language using computational grammars (Part II) and computers learning grammars from example sentences (Part III). Part I of this thesis focuses on grammar engineering and grammar testing. It describes the development and evaluation of a computational resource grammar for Latin. Latin is known for its rich morphology and free word order, both have to be handled in a computationally efficient way. A special focus is on methods how computational grammars can be evaluated using corpus data. Such an evaluation is presented for the Latin resource grammar. Part II, the central part, describes a computer-assisted language learning application based on domain-specific grammars. The language learning application demonstrates how computational grammars can be used to guide the user input and how language learning exercises can be modeled as grammars. This allows us to put computational grammars in the center of the design of language learning exercises used to help humans learn new languages. Part III, the final part, is dedicated to a method to learn domain- or application-specific grammars based on a wide-coverage grammar and small sets of example sentences. Here a computer is learning a grammar for a fragment of a natural language from example sentences, potentially without any additional human intervention. These learned grammars can be based e.g. on the Latin resource grammar described in Part II and used as domain-specific lesson grammars in the language learning application described Part II.
pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science methods. While pyMMAX2 is pure Python, and most functionality is implemented from scratch, the API re-uses the complex implementation of the essential business logic for MMAX2 annotation schemes by interfacing with the original MMAX2 Java libraries. pyMMAX2 is available for download at http://github.com/nlpAThits/pyMMAX2.
We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.
Having the necessary skills for staying in contact with friends and relatives through digital devices is crucial in today’s world. As the current COVID-19 pandemic shows, this holds especially true for the elderly. Being quarantined and restricted from physically meeting people, various communication technologies are more important than ever for staying social and informed on current events. In nursing homes, staff members are now finding new ways for staying in touch with family members by assisting residents in making video calls with mobile devices.
But what if elderly people cannot rely on personal assistance for accessing these alternative means of communication? This raises the general question of how older people can and do learn to use such technologies. Although the internet is full of guides and instructional videos on how to use smartphones or tablets, they are a cold comfort to someone who may not even know what an internet browser is.
Especially for digital newcomers, the tried and true method of face-to-face instruction is invaluable. While many older people turn to their children or grandchildren for help in all things digital, courses specifically tailored for elderly users are also increasingly popular.
More and more governmental initiatives and associations indeed acknowledge the already existing interest of elderly citizens in digital tools and their growing need to receive customized training (e.g. “SeniorSurf” and “Kansalaisen digitaidot” in Finland or “Silver Tipps” in Germany). For a researcher of social interaction, these courses can also provide a valuable window for discovering what it looks and sounds like to learn to use essential but sometimes alien technologies.
To ensure short gaps between turns in conversation, next speakers regularly start planning their utterance in overlap with the incoming turn. Three experiments investigate which stages of utterance planning are executed in overlap. E1 establishes effects of associative and phonological relatedness of pictures and words in a switch-task from picture naming to lexical decision. E2 focuses on effects of phonological relatedness and investigates potential shifts in the time-course of production planning during background speech. E3 required participants to verbally answer questions as a base task. In critical trials, however, participants switched to visual lexical decision just after they began planning their answer. The task-switch was time-locked to participants' gaze for response planning. Results show that word form encoding is done as early as possible and not postponed until the end of the incoming turn. Hence, planning a response during the incoming turn is executed at least until word form activation.
Esipuhe/Preface
(2020)
When humans have a conversation with one-another, they generally take turns speaking one after the other without overlapping each others talk or leaving silence between turns for long stretches of time. Previous research has shown that conversation is a structured practice following rules that help interlocutors to manage the flow of conversation interactively. While at the beginning of a conversation it remains open who will speak when about what and for how long, interlocutors regulate the flow of conversation as it unfolds. One basic set of rules that interlocutors operate with governs the allocation of speaking turns, with the central rule stating that whoever starts speaking first at a point in time when speaker change becomes relevant has the rights and obligations to produce the next turn. The organization of turn allocation, therefore, is one reason for conversational turn taking to be so remarkably fast, with the beginnings of turns most often being quite accurately aligned with the ends of the previous turns. Observations of this outstanding speed of turn taking gave rise to a number of questions concerning language processing in conversational situations. The studies presented in this thesis investigate some of these questions from the perspective of the current listener preparing to be the next speaker who will respond to the current turn.
The study presented in Chapter 2 investigates when next speakers begin to plan their own turn with respect to two points in time, (i) the moment when the incoming turn’s message becomes clear enough to make response planning possible and (ii) the moment when the incoming turn terminates. Results of previous studies were inconclusive about the timing of language planning in conversation, with evidence in favour of both late and early response planning. Furthermore, previous studies presented both evidence as well as counter evidence indicating that response planning depends or does not depend on an accurate prediction of the timing of the incoming turn’s end. The study presented here makes use of a novel experimental paradigm which includes a dialogic task that participants need to fulfil in response to critical utterances by a confederate. These critical utterances were structured, on the one hand, so that their message became clear either only at the end of the turn or before the end of the turn, and, on the other hand, so that it was either predictable or not predictable when exactly the turn would end. Participant’s eye-movements as well as their response latencies indicated that they always planned their next turn as early as possible, irrespective of the predictability of the incoming turn’s end. The presented results provide evidence in favour of models of turn taking that predict speech planning to happen in overlap with the incoming turn.
Having established that next speakers begin to plan their turn in overlap, the study presented in Chapter 3 goes more into detail investigating to which depth language planning progresses while the incoming turn is still unfolding. To this end, a number of psycholinguistic paradigms were combined. In the study’s main experiment, participants had to fulfil a switch-task in which they switched from picture naming in response to an auditorily presented question to making a lexical decision. By manipulating the relatedness of the word for lexical decision with the picture that was prepared to be named before the task-switch it was possible to draw inferences on which processing stages were entered during the speech production process in overlap with the incoming turn. Participants’ behavioural responses in the lexical decision task revealed that they entered the stage of phonological encoding while the incoming turn was still unfolding, showing that planning in overlap is not limited to conceptual preparation but includes all sub-processes of formulation.
Given that speech production regularly enters the stages of formulation in overlap with the incoming turn, as shown in Chapters 2 and 3, the question arises whether planning the next turn in overlap is cognitively more demanding than during the gap between turns. This question is approached in the study presented in Chapter 4 by measuring pupillometric responses of participants in a dialogic task. An increase in pupil diameter during a cognitive task is indicative of increased processing load, and pupillometric responses to planning in overlap with the incoming turn were found to be greater than responses to planning in the gap between turns. These results show that planning in overlap is more demanding than planning during the gap, even though it is highly practiced by speakers.
After Chapters 2 to 4 investigated the timing and mechanisms of speech planning in conversation, Chapter 5 turns towards the timing of articulation of a planned turn, asking the question what sources of information next speakers use to time the articulation of a planned utterance to start closely after the incoming turn comes to an end. In this Chapter’s study, participants taking turns with a confederate responded to utterances containing or not containing different cues to the location of the incoming turn’s end. Participants made use of lexical and turn-final intonational cues, but not of turn-initial intonational cues, responding faster when the relevant cues were present than when they were not present. These results show that the timing of turn initiation in next speakers depends on the recognition of the incoming turn’s point of completion and not merely on the progress in planning the next turn.
All evidence presented in Chapters 2 to 5 is summed up and bundled together in a cognitive model of turn taking, which is being presented in Chapter 6. This model assumes, centrally, that the planning of a turn and the timing of its articulation are separate cognitive processes that run in parallel in any next speaker during conversation. Planning generally starts as early as possible, often in overlap with the incoming turn, while the timing of articulation depends on the next speaker’s level of certainty that speaker change has become relevant at a particular moment, with a number of cues to the end of the incoming turn leading to an increase of certainty. Next turns are assumed to often be planned down to fully formulated utterance plans including their phonological form as early as possible on the basis of anticipations of the incoming turn’s message, which are created with the help of the general and situational knowledge about the world, the current speaker and her intentions, as well as the input that has been received so far. The level of certainty that speaker change becomes relevant rises or decreases as lexico-syntactic, prosodic, and pragmatic projections about the development of the current turn are fulfilled or not fulfilled. As the incoming turn progresses towards its end as was projected by the current listener, he becomes certain that speaker change becomes relevant and will initiate articulation of the prepared next turn. Viewing these two processes, planning a next turn and timing of its articulation, as separate makes it possible to explain the observable fast timing of turn taking while still modelling the allocation of turns as interactionally managed by interlocutors — a considerable advantage of the presented model compared to more traditional perspectives on turn taking and conversation.
The theme of the AFinLA 2020 Yearbook Methodological turns in applied language studies is discussed in this introductory article from three interrelated perspectives, variously addressed in the three plenary presentations at the AFinLA Autumn Symposium 2019 as well as in the thirteen contributions to the yearbook. In the first set of articles presented, the authors examine the role and impact of technological development on the study of multimodal digital and non-digital contexts and discourses and ensuing new methods. The second set of studies in the yearbook revisits issues of language proficiency, critically discussing relevant concepts and approaches. The third set of articles explores participation and participatory research approaches, reflecting on the roles of the researcher and the researched community.
Die zentrale Aufgabenstellung des Verbundprojektes TextTransfer (Pilot) war eine Machbarkeitsprüfung für die Entwicklung eines Text-Mining-Verfahrens, mit dem Forschungsergebnisse automatisiert auf Hinweise zu Transfer- und Impactpotenzialen untersucht werden können. Das vom Projektkoordinator IDS verantwortete Teilprojekt konzentrierte sich dabei auf die Entwicklung der methodischen Grundlagen, während der Projektpartner TIB vornehmlich für die Bereitstellung eines geeigneten Datensatzes verantwortlich war. Solchen automatisierten Verfahren liegen zumeist textbasierte Daten als physisches Manifest wissenschaftlicher Erkenntnisse zugrunde, die im Falle von TextTransfer (Pilot) als empirische Grundlage herangezogen wurden. Das im Verbund zur Anwendung gebrachte maschinelle Lernverfahren stützte sich ausschließlich auf deutschsprachige Projektendberichte öffentlich geförderter Forschung. Diese Textgattung eignet sich insbesondere hinsichtlich ihrer öffentlichen Verfügbarkeit bei zuständigen Gedächtnisorganisationen und aufgrund ihrer im Vergleich zu anderen Formaten wissenschaftlicher Publikation relativen strukturellen wie sprachlichen Homogenität. TextTransfer (Pilot) ging daher grundsätzlich von der Annahme struktureller bzw. sprachlicher Ähnlichkeit in Berichtstexten aus, bei denen der Nachweis tatsächlich erfolgten Transfers zu erbringen war. Im Folgenden wird in diesen Fällen von Texten bzw. textgebundenen Forschungsergebnissen mit Transfer- und Impactpotenzial gesprochen werden. Es wurde ferner postuliert, dass sich diese Indizien von sprachlichen Eigenschaften in Texten zu Projekten ohne nachzuweisenden bzw. ggf. auch niemals erfolgtem, aber potenziell möglichem Transfer oder Impact unterscheiden lassen. Mit einer Verifizierung dieser Annahmen war es möglich, Transfer- oder Impactwahrscheinlichkeiten in großen Mengen von Berichtsdaten ohne eingehende Lektüre zu prognostizieren.
Sogenannte „Pragmatikalisierte Mehrworteinheiten“ sind im Deutschen hochfrequent und unterliegen bisweilen tiefgreifenden phonetischen Reduktionsprozessen. Diese können Realisierungsvarianten hervorbringen, die in der Rückschau auf mehr als eine lexematische Ursprungsform zurückführbar sind. Die vorliegende Studie untersucht mit [ˈzɐmɐ] einen besonders prägnanten Fall dieser Art anhand eines Perzeptionsexperimentes.
EFNIL, the European Federation of National Institutions for Language, promotes the standard languages and the linguistic diversity of the European countries as an essential characteristic of their cultural diversity and wealth. The 17th annual conference of EFNIL in Tallinn dealt with the relation between language and economy.
• Language politics often have economic intentions, the language use of the individual is embedded in economic conditions, languages seem to differ in their economic value. In recent years, economists and sociolinguists have developed models of describing these interdependencies.
• The interaction in multilingual settings needs professional handling. There are traditional instances such as language teaching or translation and new professional fields of the digital age such as multilingual databases. Lots of economic needs and opportunities appear in this field.
• Digitization and societal diversity are two elements leading to more successful interaction, assisted by the use of automatic everyday translation, the development of plain language etc.
This volume presents an extensive overview of the interplay of language and economy.
In diesem Beitrag werden exemplarisch verschiedene potenzielle Gebrauchsmuster mit dem deutschen Lemma wissen gesammelt und ihre in der Fachliteratur vorgelegten interaktionslinguistisch-funktionalen Beschreibungen für einen Strukturierungsversuch genutzt. Im Zentrum steht ein multifunktionaler handlungsorientierter Ansatz zur Beschreibung von Interaktion im Gespräch. Der Beitrag greift dabei Überlegungen auf, die im Rahmen des Forschungsprojekts Lexik des gesprochenen Deutsch (= LeGeDe) zur Erstellung einer korpusbasierten lexikogra- fischen Ressource lexikalischer Besonderheiten des gesprochenen Deutsch in der Interaktion thematisiert wurden.
Schlüsselwörter: Muster, Lexik des gesprochenen Deutsch, Interaktion, Internetlexikografie
Even though the use of several languages has become more common in modern societies, it is important to find a common language in order to communicate economically (by the way, also with regard to economic success). So, of course, it is an advantage and a basic request in our national societies to be able to communicate by means of the national language(s). But looking a bit closer at the communicative demands of today one sees that there is a growing need to react to internal variation, and that a modern linguistic identity not only covers that fact, but also the fact, that English – in different forms – is part of a linguistic spectrum fitting a modern European communicative life. In the last years a communicative pattern is developing within an elite group of young academically educated people that is based on the use of English only, more or less ignoring the connection to the national linguistic surroundings, somehow kind of an alternative monolingualism. But looking at the communicative needs in our complex societies losing the ability to cope with different linguistic options in different communicative situations and to integrate this possibility into your linguistic identity is a rather restricted option – also in economic terms. And this even holds not taking into account the linguistic effect of modern migration.
Despite the importance of the agent role for language grammar and processing, its definition and features are still controversially discussed in the literature on semantic roles. Moreover, diagnostic tests to dissociate agentive from non-agentive roles are typically applied with qualitative introspection data. We investigated whether quantitative acceptability ratings obtained with a well-established agentivity test, the DO-cleft, provide evidence for the feature-based prototype account of (Dowty, David R. 1991. Thematic protoroles and argument selction. Language 67(3). 547-619) postulating that agentivity increases with the number of agentive features that a role subsumes. We used four different intransitive verb classes in German and collected acceptability judgements from non-expert native speakers of German. Our results show that sentence acceptability increases linearly with the number of agentive features and, hence, agentivity. Moreover, our findings confirm that sentience belongs to the group of proto-agent features. In summary, this suggests that a multidimensional account including a specific mechanism for role prototypicality (feature accumulation) successfully captures gradient acceptability clines. Quantitative acceptability estimates are a meaningful addition to linguistic theorizing.
In Theaterproben entwickeln Beteiligte gemeinsam eine Inszenierung, die zur Aufführung gebracht wird. Ein wesentliches Mittel dazu ist das Vorspielen von Teilen des Stücks und das anschließende Besprechen. Dies geschieht üblicherweise in Rollenteilung: Die Schauspielenden führen Teile des Stücks vor, während die Regie zuschaut und gegebenenfalls interveniert, woran sich Besprechungen anschließen können. Dieser Teil von Theaterproben, in dem abwechselnd vorgespielt und besprochen wird, haben wir Spielprobe genannt (siehe Einleitung zu diesem Themenheft). Eine wesentliche interaktionsorganisatorische Aufgabe von Spielproben besteht für die Beteiligten darin, Schauspielaktivitäten und Besprechungsaktivitäten miteinander zu verzahnen. Dies geschieht durch Transitionspraktiken, die das Spiel entweder unterbrechen oder wieder eröffnen. Der vorliegende Beitrag untersucht Transitionspraktiken in Spielproben als ein konstitutives Moment ihrer interaktiven Organisation. Fokussiert werden Praktiken, die das Spiel unterbrechen, so genannte Interventionen. Nach einer detaillierten Fallanalyse, die eine prototypische Transition vom Spiel ins Besprechen und zurück ins Spiel veranschaulicht (Kap. 4.1/4.2), widmet sich der Rest des Beitrags der Analyse einer Kollektion von Interventionen. Es zeigt sich, dass Interventionen normativen Orientierungen unterliegen und verwendete Praktiken hinsichtlich verschiedener Dimensionen (etwa Ursache/Grund der Intervention) systematisch variieren.
In the present article we argue that all communication is medial in the sense that every human sign-based interaction is shaped by medial aspects from the outset. We propose a dynamic, semiotic concept of media that focuses on the process-related aspect of mediality, and we test the applicability of this concept using as an example the second presidential debate between Clinton and Trump in 2016. The analysis shows in detail how the sign processing during the debate is continuously shaped by structural aspects of television and specific traits of political communication in television. This includes how the camerawork creates meaning and how the protagonists both use the affordances of this special mediality. Therefore, it is not adequate in our view to separate the technical aspects of the medium, the ‘hardware’, from the processual aspects and the structural conditions of communication. While some aspects of the interaction are directly constituted by the medium, others are more indirectly shaped and influenced by it, especially by its institutional dimension – we understand them as second-order media effects. The whole medial procedure with its specific mediality is a necessary, but not a sufficient condition of meaning-making. We distinguish the medial procedure from the semiotic modes employed, the language games played and the competence of the players involved.
Zwischen den Jahren oder eine Zeit zwischen den Zeiten. Sprachliche Betrachtungen zur "Normalität"
(2020)
Twenty-two historical encyclopedias encoded in TEI: a new resource for the Digital Humanities
(2020)
This paper accompanies the corpus publication of EncycNet, a novel XML/TEI annotated corpus of 22 historical German encyclopedias from the early 18th to early 20th century. We describe the creation and annotation of the corpus, including the rationale for its development, suggested methodology for TEI annotation, possible use cases and future work. While many well-developed annotation standards for lexical resources exist, none can adequately model the encyclopedias at hand, and we therefore suggest how the TEI Lex-0 standard may be modified with additional guidelines for the annotation of historical encyclopedias. As the digitization and annotation of historical encyclopedias are settling on TEI as the de facto standard, our methodology may inform similar projects.
I’ve got a construction looks funny – representing and recovering non-standard constructions in UD
(2020)
The UD framework defines guidelines for a crosslingual syntactic analysis in the framework of dependency grammar, with the aim of providing a consistent treatment across languages that not only supports multilingual NLP applications but also facilitates typological studies. Until now, the UD framework has mostly focussed on bilexical grammatical relations. In the paper, we propose to add a constructional perspective and discuss several examples of spoken-language constructions that occur in multiple languages and challenge the current use of basic and enhanced UD relations. The examples include cases where the surface relations are deceptive, and syntactic amalgams that either involve unconnected subtrees or structures with multiply-headed dependents. We argue that a unified treatment of constructions across languages will increase the consistency of the UD annotations and thus the quality of the treebanks for linguistic analysis.
The annual microcensus provides Germany’s most important official statistics. Unlike a census it does not cover the whole population, but a representative 1%-sample of it. In 2017, the German microcensus asked a question on the language of the population, i.e. ‘Which language is mainly spoken in your household?’ Unfortunately, the question, its design and its position within the whole microcensus’ questionnaire feature several shortcomings. The main shortcoming is that multilingual repertoires cannot be captured by it. Recommendations for the improvement of the microcensus’ language question: first and foremost the question (i.e. its wording, design, and answer options) should make it possible to count multilingual repertoires.
Von Nichtstun und Erholung (an Weihnachten und zu anderen Zeiten) (aus der Rubrik Neuer Wortschatz)
(2020)
„Bausteine einer Korpusgrammatik des Deutschen“ ist eine neue Schriftenreihe, die am Leibniz-Institut für Deutsche Sprache in Mannheim (IDS) entsteht. Sie setzt sich zum Ziel, mit korpuslinguistischen Methoden die Vielfalt und Variabilität der deutschen Grammatik in großer Detailschärfe zu erfassen und gleichzeitig für die Validierbarkeit der Ergebnisse zu sorgen. Die erste Ausgabe enthält eine Einführung in die Reihe sowie vier als Kapitel einer neuen Grammatik gestaltete Texte: 1. Grundlegende Aspekte der Wortbildung, 2. Bau von und Umbau zu Adverbien, 3. Starke vs. schwache Flexion aufeinanderfolgender attributiver Adjektive und 4. Reihenfolge attributiver Adjektive. Die Ausgabe ist mit einer interaktiven Datenbank zu attributiven Adjektiven verknüpft.
The present chapter investigates the relative order of attributive adjectives in German. Based on corpus data, our results corroborate previous findings that semantics is the most important factor in accounting for adjective order. Going beyond previous studies, we also consider coordinated structures (such as mit [[großem, verwildertem] Garten] ‘with (a) large, overgrown garden’), where both adjectives are of equal rank. While adjective order in embedded structures (mit [ schwierigem [ familiärem Hintergrund ]] ‘with (a) difficult domestic background’) can be predicted rather accurately on semantic grounds, we show that predictions can also be made for coordinated structures, albeit with lower accuracy. Using regression analysis, we examine how semantic factors interact with a number of other explanatory variables.
Einleitung
(2020)
A corpus-based academic grammar of German is an enormous undertaking, especially if it aims at using state-of-the-art methodology while ensuring that its study results are verifiable. The Bausteine-series, which is being developed at the Leibniz Institute for the German Language (IDS), presents individual “building blocks” for such a grammar. In addition to the peer-reviewed texts, the series publishes the results of statistical analyses and, for selected topics, the underlying data sets.
This article explores a sequence organizational phenomenon that results from the use of a loosely specifiable turn format (viz., That’s + wh-clause) for launching (next) sequences while at the same time connecting back to a prior turn. Using this practice creates a sequential juncture, i.e., a pivot-like nexus between one sequence and a next. In third position, such junctures serve to accomplish seamless sequential transitions from one sequence into a next by presenting the latter as locally occasioned. The practice may, however, also be deployed in second position to launch actions that have not been made relevant or provided for by the preceding action and exhibit response relevance themselves. The sequential junctures then become retro-sequential in character: They transform the projected trajectory of the sequence in progress and create interlocking sequential structures. These findings highlight that sequence is practice, while pointing to understudied interconnections between tying and sequentiality. Data are in English.
This chapter focuses on the formation of adverbs from a corpuslinguistic perspective, providing an overview of adverb formation patterns in German that includes frequencies and hints to productivity as well as combining quantitative methods and theoretically founded hypotheses to address questions that concern possible grammaticalization paths in domains that are formally marked by prepositional elements or inflectional morphology (in particular, superlative or superlative-derived forms). Within our collection of adverb types from the project corpus, special attention is paid to adverbs built from primary prepositions. The data suggest that generally, such adverb formation involves the saturation of the internal argument slot of the relation-denoting preposition. In morphologically regular formations with the preposition in final position, pronominal forms like da ‘there’, hier ‘here’, wo ‘where’ as well as hin ‘hither’ and her ‘thither’ serve to derive adverbs. On the other hand, morphologically irregular formations with the preposition – in particular: zu ‘to’ or vor ‘before, in front of’ – in initial posi-tion show traits of syntactic origin such as (remnants of) inflectional morphology. The pertaining adverb type dominantly saturates the internal argument slot by means of universal quantification that is part and parcel as well of the derivation of superlatives and demonstrably fuels the productivity of the pertaining formation pattern.
This chapter begins with a sketch of the specifics of our approach, an overview of the contents of the chapters on word formation and some methodological notes. It then discusses the general characteristics of word formations and of their overall inventory, comparing word formations to primary words. Furthermore, the chapter explores the relative frequencies of word formations in different vocabulary areas and traces the word formation profiles of individual parts of speech. Finally, it compiles the characteristic word formation rules for different parts of speech.
Usually, weak inflection of an attributive or nominalized adjective occurs if the adjective is preceded by an inflected determiner: mit diesem technischen Aufwand (‘at great technical expense’). Otherwise, the inflection of the adjective is strong: mit technischem Aufwand. Following this rule of thumb, we would expect strong inflection of an adjective following another adjective whenever the determiner is missing: mit hohem technischem Aufwand. But many German speakers opt for a weak dative singular ending -en following the strong ending -em on the first adjective: mit hohem technischen Aufwand. This chapter shows which explanatory variables play a role in this variation within standard German.
Politiker und Parteien sehen sich heutzutage oft mit dem Vorwurf konfrontiert, sie heben sich kaum mehr voneinander ab, seien gar „austauschbar“. Umso größer scheint das Bedürfnis nach Abgrenzung. Diese wird kommunikativ hergestellt und ist am besten von den diskursiven Zusammenhängen und Akteurskonstellationen her, in denen sie sich aktualisiert, nachzuvollziehen.
Das Vorgehen in dieser Arbeit gliedert sich im Wesentlichen in drei Schritte: Zunächst wird eine Theorieskizze der Abgrenzung als Sprechhandlung entworfen. Hierbei geht es vor allem darum, verschiedene Lesarten zu erschließen und die Abgrenzung in einem Panorama verwandter Konzepte wie etwa Ausgrenzung, Distinktion und Distanzierung zu verorten (Teil 1). Daraufhin wird die Plenardebatte als Textsorte erschlossen und in ihren kommunikativen Spezifika erfasst, wobei besonders die Stichworte Inszeniertheit, Mehrfachadressierung und die Frage nach dem Verhältnis zwischen Mündlichkeit und Schriftlichkeit in den Blickpunkt rücken (Teil 2). Sodann wird mithilfe der pragma-semiotischen Textarbeit als Methode ganz konkret sprachliches Datenmaterial aus Plenardebatten analysiert und interpretativ ausgewertet (Teile 3 und 4). Dabei kommen auch korpuslinguistische Verfahren zum Einsatz, die jedoch letztlich im Dienste einer qualitativ orientierten Analyse stehen.
Die Analyse berücksichtigt sowohl explizite als auch implizite Formen sprachlicher Abgrenzung. Sie zeigt unter anderem, dass politische Abgrenzungshandlungen keineswegs parteispezifisch sind, sondern von allen Parteien und Akteuren mehr oder weniger konstant praktiziert werden. Dabei wird Abgrenzung hauptsächlich als Selbstpositionierung realisiert; bisweilen finden sich aber durchaus auch Fremdpositionierungen – etwa als Aufforderungen an andere Akteure, sich gegenüber Dritten abzugrenzen. Auf der Ebene der sprachlichen Formen lässt sich schließlich durch eine Art experimentelle Annäherung mit korpuslinguistischen Verfahren eine Reihe von Mehrworteinheiten ausmachen, die als Indikatoren für implizite Abgrenzung gelten können.
Using video-recordings from one day of a theater project for young adults, this paper investigates how the meaning of novel verbal expressions is interactionally constituted and elaborated over the interactional history of a series of activities. We examine how the theater director introduces and instructs the group in the Chekhovian technique of acting, which is based on “imagining with the body,” and how the imaginary elements of the technique are “brought into existence” in the language of the instructions. By tracking shifts in the instructor’s use of the key expressions invisible/imaginary/inner body or movement through a series of exercises, we demonstrate how they are increasingly treated as real and perceivable bodily conduct. The analyses focus on the instructor’s attribution of factual and agentive properties to these expressions, and the changes that these properties undergo over the series of instructions. This case demonstrates the significance of longitudinal processes for the establishment of shared meaning in social interaction. The study thereby contributes to the field of interactional semantics and to longitudinal studies of social interaction.
This paper presents the corpus-based lexicographical prototype that was developed within the framework of the project Lexik des gesprochenen Deutsch (=LeGeDe) as a thirdparty funded project. Research results regarding the information offered in dictionaries have shown that there is a necessity for information on spoken lexis and its interactional functions. The resulting LeGeDe-prototype is based on these needs and desiderata and is thus an innovative example for the adequate representation of spoken language in online dictionaries. It is available online since September 2019 (https://www.owid.de/legede/). In the following sections, after first focusing on the presentation of the project’s goals, the data basis, the intended end user, and the applied methods, we will illustrate the microstructure of the prototype and the information provided in a dictionary entry based on the lemma eben. Finally, we will summarize innovative aspects that are important for the implementation of such a resource.
Dieser Beitrag beschreibt, welche Schritte nötig sind, um die Daten des Archivs der Grafen v. Platen (AGP) für Forschungsdateninfrastrukturen (FDI) zugänglich zu machen: die Daten konvertieren, die Metadaten extrahieren, Daten und Metadaten indizieren sowie die Datenmodelle für Daten und Metadaten so ergänzen, dass sie die Bestände des Archivs sinnvoll erfassen. Zugleich wird begründet, weshalb man überhaupt solchen Aufwand treiben sollte: nämlich, damit die Daten einem größeren Publikum zur Verfügung stehen und überdies mit Werkzeugen bearbeitet werden können, die in den Infrastrukturen zur Verfügung stehen, und damit eine weitere Verlinkung und Kombination mit externen Ressourcen erfolgen kann, sodass ein deutlicher Mehrwert entstehen kann.
Im Beitrag steht das LeGeDe-Drittmittelprojekt und der im Laufe der Projektzeit entwickelte korpusbasierte lexikografische Prototyp zu Besonderheiten des gesprochenen Deutsch in der Interaktion im Zentrum der Betrachtung. Die Entwicklung einer lexikografischen Ressource dieser Art knüpft an die vielfältigen Erfahrungen in der Erstellung von korpusbasierten Onlinewörterbüchern (insbesondere am Leibniz-Institut für Deutsche Sprache, Mannheim) und an aktuelle Methoden der korpusbasierten Lexikologie sowie der Interaktionsanalyse an und nimmt als multimedialer Prototyp für die korpusbasierte lexikografische Behandlung von gesprochensprachlichen Phänomenen eine innovative Position in der modernen Onlinelexikografie ein. Der Beitrag befasst sich im Abschnitt zur LeGeDe-Projektpräsentation ausführlich mit projektrelevanten Forschungsfragen, Projektzielen, der empirischen Datengrundlage und empirisch erhobenen Erwartungshaltungen an eine Ressource zum gesprochenen Deutsch. Die Darstellung der komplexen Struktur des LeGeDe-Prototyps wird mit zahlreichen Beispielen illustriert. In Verbindung mit der zentralen Information zur Makro- und Mikrostruktur und den lexikografischen Umtexten werden die vielfältigen Vernetzungs- und Zugriffsstrukturen aufgezeigt. Ergänzend zum abschließenden Fazit liefert der Beitrag in einem Ausblick umfangreiche Vorschläge für die zukünftige lexikografische Arbeit mit gesprochensprachlichen Korpusdaten.
According to Positioning Theory, participants in narrative interaction can position themselves on a representational level concerning the autobiographical, told self, and a performative level concerning the interactive and emotional self of the tellers. The performative self is usually much harder to pin down, because it is a non-propositional, enacted self. In contrast to everyday interaction, psychotherapists regularly topicalize the performative self explicitly. In our paper, we study how therapists respond to clients' narratives by interpretations of the client's conduct, shifting from the autobiographical identity of the told self, which is the focus of the client's story, to the present performative self of the client. Drawing on video recordings from three psychodynamic therapies (tiefenpsychologisch fundierte Psychotherapie) with 25 sessions each, we will analyze in detail five extracts of therapists' shifts from the representational to the performative self. We highlight four findings:
• Whereas, clients' narratives often serve to support identity claims in terms of personal psychological and moral characteristics, therapists rather tend to focus on clients' feelings, motives, current behavior, and ways of interacting.
• In response to clients' stories, therapists first show empathy and confirm clients' accounts, before shifting to clients' performative self.
• Therapists ground the shift to clients' performative self by references to clients' observable behavior.
• Therapists do not simply expect affiliation with their views on clients' performative self. Rather, they use such shifts to promote the clients' self-exploration. Yet, if clients resist to explore their selves in more detail, therapists more explicitly ascribe motives and feelings that clients do not seem to be aware of. The shift in positioning levels thus seems to have a preparatory function for engendering therapeutic insights.
The majority of new words in dictionaries are included following a certain period of time during which they have become more frequent in use and established morphosyntactic and orthographic features consistent with the language system they are borrowed into. In case of borrowed new words, inclusion often takes place at a transitional state of assimilation to the language system, where delayed orthographic or phonetic change cannot be ruled out and the differentiation between standard-conforming and non-standard orthographic word forms of a lemma oftentimes depends on the proximity between the writing systems of the donor and the recipient language. Following a brief overview of loan words and their lexicographical description in the Neologismenwörterbuch, a specialized online dictionary for neologisms in contemporary German, this paper presents findings of an investigative case study on dictionary entries for a neologism borrowed from a logographic language system and discusses the potential of a corpus-based description of new loan words.
This paper analyses the variation we find in the realization of finite clausal complements in the position of prepositional objects in a set of Germanic languages. The Germanic languages differ with respect to whether prepositions can directly select a clause (North Germanic) or not and instead need a prepositional proform (Continental West Germanic). Within the Continental West Germanic languages, we find further differences with respect to the constituent structures. We propose that German strong vs. weak prepositional proforms (e.g. drauf vs. darauf) differ with respect to their syntax, while this is not the case for the Dutch forms (ervan vs. daarvan). What the Germanic languages under consideration share is that the prepositional element can be covert, except in English. English shows only limited evidence for the presence of P with finite clauses in the position of prepositional objects generally, but only with a selected set of verbs. This investigation is a first step towards a broader study of the nature of clauses in prepositional object positions and the implications for the syntax of clausal complementation.
Der Einfluss extremistischer Gewaltereignisse auf das Framing von Extremismus in Online-Medien
(2020)
In diesem Beitrag untersuchen wir die Darstellung von Rechtsextremismus, Linksextremismus und Islamismus im medialen Diskurs am Beispiel von SPIEGEL Online, einem der deutschen Leitmedien. Wir leiten vier zentrale Dimensionen für die Konzeptualisierung von Extremismen ab: Ideologie und Organisation, Herkunft der Akteure, Stellung zur Gesellschaft und Typische Handlungen. Wir beobachten die Entwicklung der Darstellung der drei Extremismen an möglichen Bruchpunkten: Wir untersuchen das assoziative Framing der drei Extremismen vor und nach prominenten extremismusbezogenen Gewaltereignissen, namentlich die Anschläge des 11. September, die Veröffentlichung des NSU-Skandals und linksextremistische Aktivitäten während des G20-Gipfels in Hamburg. Mittels einer Kollokationsanalyse identifizieren wir mit den Extremismen assoziierte Aspekte und ordnen diese den Konzeptualisierungsdimensionen zu. Wir beobachten Veränderungen im Framing, die durch die ausgewählten Ereignisse bedingt sind, und vergleichen das resultierende Framing mit den Kerndefinitionen des Verfassungsschutzes aus dem Bericht des Jahres 2017, um mögliche Unterschiede in der Konzeptualisierung von Extremismen mit möglicherweise unterschiedlichen Handlungslogiken als Resultat divergierender Konzeptualisierungen herauszuarbeiten.
This chapter describes the resources that speakers of Polish use when recruiting assistance and collaboration from others in everyday social interaction. The chapter draws on data from video recordings of informal conversation in Polish, and reports language-specific findings generated within a large-scale comparative project involving eight languages from five continents (see other chapters of this volume). The resources for recruitment described in this chapter include linguistic structures from across the levels of grammatical organization, as well as gestural and other visible and contextual resources of relevance to the interpretation of action in interaction. The presentation of categories of recruitment, and elements of recruitment sequences, follows the coding scheme used in the comparative project (see Chapter 2 of the volume). This chapter extends our knowledge of the structure and usage of Polish with detailed attention to the properties of sequential structure in conversational interaction. The chapter is a contribution to an emerging field of pragmatic typology.
Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
(2020)
We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of “scientific language” and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.
CLARIN contractual framework for sharing language data: the perspective of personal data protection
(2020)
The article analyses the responsibility for ensuring compliance with the General Data Protection Regulation (GDPR) in research settings. As a general rule, organisations are considered the data controller (responsible party for the GDPR compliance). Research constitutes a unique setting influenced by academic freedom. This raises the question of whether academics could be considered the controller as well. However, there are some court cases and policy documents on this issue. It is not settled yet. The analysis serves a preliminary analytical background for redesigning CLARIN contractual framework for sharing data.
N-grams are of utmost importance for modern linguistics and language theory. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of newspaper publishers) also provide interesting arguments in this debate. The proposed paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.
The CMDI Explorer
(2020)
We present the CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. The CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
Signposts for CLARIN
(2020)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold et al. 2020 present Signposts as a solution to challenges in long-term preservation of corpora, especially corpora that are continuously extended and subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution Signposts can make to the CLARIN infrastructure and document the design for the CMDI profile.
Towards Comprehensive Definitions of Data Quality for Audiovisual Annotated Language Resources
(2020)
Though digital infrastructures such as CLARIN have been successfully established and now provide large collections of digital resources, the lack of widely accepted standards for data quality and documentation still makes re-use of research data a difficult endeavour, especially for more complex resource types. The article gives a detailed overview over relevant characteristics of audiovisual annotated language resources and reviews possible approaches to data quality in terms of their suitability for the current context. Conclusively, various strategies are suggested in order to arrive at comprehensive and adequate definitions of data quality for this particular resource type.
This paper presents the QUEST project and describes concepts and tools that are being developed within its framework. The goal of the project is to establish quality criteria and curation criteria for annotated audiovisual language data. Building on existing resources developed by the participating institutions earlier, QUEST develops tools that could be used to facilitate and verify adherence to these criteria. An important focus of the project is making these tools accessible for researchers without substantial technical background and helping them produce high-quality data. The main tools we intend to provide are the depositors’ questionnaire and automatic quality assurance, both developed as web applications. They are accompanied by a Knowledge base, which will contain recommendations and descriptions of best practices established in the course of the project. Conceptually, we split linguistic data into three resource classes (data deposits, collections and corpora). The class of a resource defines the strictness of the quality assurance it should undergo. This division is introduced so that too strict quality criteria do not prevent researchers from depositing their data.
Vorwort
(2020)
This article examines the language contact situation as well as the language attitudes of the Caucasian Germans, descendants of German-born inhabitants of the Russian Empire and the Soviet Union who emigrated in 1816/17 to areas of Transcaucasia. After deportations and migrations, the group of Caucasian Germans now consists of those who have since emigrated to Germany and those who still live in the South Caucasus. It’s the first time that sociolinguistic methods have been used to record data from the generation who experienced living in the South Caucasus and in Germany as well as from two succeeding generations. Initial results will be presented below with a focus on the language contact constellations of German varieties as well as on consequences of language contact and language repression, which both affect language attitudes.
Editorial
(2020)
Journal for language technology and computational linguistics. Special Issue on offensive language
(2020)
Recent years have seen a sharp increase in studies of offensive language (and related notions such as abusive language, hate speech, verbal aggression etc.) as well as of patterns of online behavior such as cyberbullying and trolling. Multiple efforts have been launched for the exploration of computational approaches and the establishment of benchmark datasets for various languages (Basile et al. (2019), Wiegand et al. (2018), Zampieri et al. (2019)).
The present article shows an experimental subject investigation on elements of video telephony in relation to experiencing and feeling connectedness and intimacy within private interpersonal communication. Particular interests are questions about possible relationships between image detail, angle of view or perspective as well as image format or the foreign and personal perception of the communicators. Central to this is the question of whether the practices and interactions of users in dealing with communication technology can be used to derive possible conclusions on negotiation measures or even adaptation services. The obtained results are presented on the basis of an introductory theoretical discussion. It is followed by a summary and analysis as well as an outlook on the further use and significance of the results.
Nachruf auf Helmut Frosch
(2020)