Refine
Year of publication
Document Type
- Article (920)
- Conference Proceeding (328)
- Part of a Book (235)
- Review (46)
- Book (27)
- Part of Periodical (18)
- Report (9)
- Other (6)
- Working Paper (5)
- Image (1)
Language
- English (828)
- German (719)
- French (21)
- Portuguese (6)
- Multiple languages (5)
- Russian (5)
- Ukrainian (4)
- Polish (3)
- Latvian (2)
- Croatian (1)
Keywords
- Deutsch (532)
- Korpus <Linguistik> (304)
- Konversationsanalyse (132)
- Interaktion (120)
- Computerlinguistik (110)
- Gesprochene Sprache (95)
- Rezension (85)
- Wörterbuch (76)
- Kommunikation (61)
- Annotation (58)
Publicationstate
- Veröffentlichungsversion (1008)
- Zweitveröffentlichung (411)
- Postprint (166)
- Ahead of Print (6)
- Hybrides Open Access (2)
- (Verlags)-Lektorat (1)
- Preprint (1)
Reviewstate
- Peer-Review (1596) (remove)
Publisher
- de Gruyter (97)
- IDS-Verlag (91)
- Erich Schmidt (75)
- Association for Computational Linguistics (35)
- Schmidt (35)
- European Language Resources Association (34)
- Verlag für Gesprächsforschung (34)
- Erich Schmidt Verlag (33)
- Institut für Deutsche Sprache (28)
- Springer (28)
Der Beitrag widmet sich dem Thema der kommunikativen Deviationen in Interviews im Ukrainischen und Deutschen. Dabei werden die Deviationen sowohl in den Presseinterviews als auch in den populärsten Videointerviews auf YouTube untersucht. Die Deviationen werden in die von der Position des Adressanten, des Adressaten sowie des Zuschauers aufgeteilt. Die Aufmerksamkeit wird der Sprach- und der kommunikativen Kompetenz der Kommunikanten als der Hauptursache der Deviationen in den Interviews gelenkt. Die Deviationen werden als eine der Voraussetzungen der erfolgreichen Kommunikation bestimmt.
Статтю присвячено комунікативним девіаціям (невдачам) на матеріалі українських і німецьких телеінтерв’ю з П. Порошенком та А. Меркель. Встановлено, що спілкування осіб з різними комунікативними цілями і стратегіями – головні причини девіацій. Проаналізовано комунікативні невдачі, враховуючи позиції адресанта й адресата, а також глядача даних інтерв’ю, визначено спільні та відмінні стратегії у випадку комунікативних девіацій в українській і німецькій лінгвокультурах.
Статтю присвячено дослідженню комунікативних невдач у мовленнєвому жанрі відеоінтерв’ю крізь призму української національної ідентичності. Визначено тематику, типи і жанрово-мовну специфіку українського відеоінтерв’ю як зразка діалогічного мовлення. Встановлено специфіку комунікативних невдач у цьому жанрі (зі спортсменами, політиками і культурними діячами) з огляду на позиції комунікантів, структурні рівні досліджуваного жанру та максими спілкування.
This paper discusses contemporary societal roles of German in the Baltic states (Latvia, Estonia, Lithuania). Speaker and learner statistics and a summary of sociolinguistic research (Linguistic Landscapes, language learning motivation, language policies, international roles of languages) suggest that German has by far fewer speakers and functions than the national languages, English, and Russian, and it is not a dominant language in the contemporary Baltics anymore. However, German is ahead of ‘any other language’ in terms of users and societal roles as a frequent language in education, of economic relations, as a historical lingua franca, and a language of traditional and new minorities. Highly diverse groups of users and language policy actors form a ‘coalition of interested parties’ which creates niches which guarantee German a frequent use. In the light of the abundance of its functions, the paper suggests the concept ‘additional language of society’ for a variety such as German in the Baltics – since there seems to be no adequate alternative labelling which would do justice to all societal roles. The paper argues that this concept may also be used for languages in similar societal situations and, not least, be useful in language marketing and the promotion of multilingualism.
This paper examines multi-unit turns that allow speakers to retrospectively close the prior sequence while prospectively launching a new sequence, which Schegloff (1986) referred to as interlocking organization. Using English telephone conversations as data, we focus on how multi-unit turns are used for topic shifts, and show that interlocking organization operates in conjunction with other phonetic and lexical features, such as increased pitch and overt markers of disjunction (e.g., “listen”). In addition, speakers utilize an audible inbreath that is placed between the first and the second units as a central interactional resource to project further talk, thereby suppressing speaker transition and possibly highlighting the action delivered in the second unit as being distinctly new. We propose that interlocking multi-unit turns, when used to make topically disjunctive moves, promote progressivity by avoiding a possible lapse in turn transition
This contribution summarizes the lessons learned from the organization of a joint conference on text analytics research by the Business, Economic, and Related Data (BERD@NFDI) and Text+ consortia within the National Research Data Infrastructure (NFDI) in Germany. The collaboration aimed to identify common ground and foster interdisciplinary dialogue between scholars in the humanities and in the business domain. The lessons learned include the importance of presenting research questions using textual data to establish common ground, similarities in methodology for processing textual data between the consortia, similarities in research data management, and the need for regular interconsortial discussions on textual analysis methods and data. The collaboration proved valuable for interdisciplinary dialogue within the NFDI, and further collaboration between the consortia is planned.
"Reproducibility crisis" and "empirical turn" are only two keywords when it comes to providing reasons for research data management. Research data is omnipresent and with the more and more automatic data processing procedures, they become even more important. However, just because new methods require data and produce data, this does not mean that data are easily accessible, reusable or even make a difference in the CV of a researcher, even if a large portion of research goes into data creation, acquisition, preparation, and analysis. In this talk I will present where we find data in the research process, where we may find appropriate support for data management and advocate for a procedure for including it in research publications and resumes.
This presentation relies on work within the BMBF-funded project CLARIN-D. It also builds on work within the German National Research Data Infrastructure (NFDI) consortium Text+, DFG project number 460033370.
Prediction is a central mechanism in the human language processing architecture. The psycholinguistic and neurolinguistic literature has seen a lively debate about what form prediction may take and what status it has for language processing in the human mind and brain. While predictions are a ubiquitous finding, the implications of these results for models of language processing differ. For instance, eyetracking data suggest that predictions may rely on sublexical orthographic information in natural reading, while electrophysiological data provide mixed evidence for form-based predictions during reading. Other research has revealed that humans rapidly adapt to text specifics and that their predictive capacity varies, broadly speaking, in accordance with inter- and intra-individual language proficiency, which cuts across the speaker groups (e.g. L1 vs. L2 speakers, skilled vs. untrained readers) traditionally used for experimental contrasts. There is therefore evidence that the kind and strength of linguistic predictions depend on (at least) three sources of variability in language processing: speaker, text genre and experimental method.
The aim of this Research Topic is to develop a better understanding of prediction in light of the three sources of variability in language processing, by providing an overview of state-of-the art research on predictive language processing and by bringing together research from various disciplines.
First, intra-and inter-individual differences and their influence on predictive processes remain underrepresented in experimental research on predictive processing. How do language users differ in their predictive abilities and strategies, and how are these differences shaped by e.g. biological, social and cultural factors?
Second, while language users experience great stylistic diversity in their daily language exposure and use, the majority of language processing research still focuses on a very constrained register of well-controlled sentences composed in the standard language. How are predictions shaped by extra- and meta-linguistic context, such as register/genre or accent/speaker identity, and how may this influence the processing of experimental items in another language or text variety?
Third, the Research Topic invites contributions that make use of a multi-method approach, such as combined behavioral and electrophysiological measures or experimental methods combined with measures extracted from corpus data. What opportunities and challenges do we face when integrating multiple approaches to examine linguistic, experimental and individual differences in human predictive capacity?
We welcome contributions from all areas of empirical psycho- and neurolinguistics, but contributions must explicitly address variability and variation in language and language processing. Relevant topics include individual differences and the impact of genre, modality, register and language variety. Contributions that go beyond single word and single sentence paradigms are especially desirable. Experimental, corpus-based, meta-analytic and review papers, as well as theoretical/opinion pieces are welcome; however, papers of the latter type should support their arguments with substantial empirical evidence from the literature. Particularly desirable are contributions which combine topics and/or methods, such as the impact of an individual's native dialect on processing of constructions that show variability in the standard language (e.g. choice of auxiliary, agreement of mass nouns, etc.) or experimental methods combined with measures extracted from corpus data such as information-theoretic surprisal.
Simultandolmetschen ist eine komplexe und kognitive Aktivität, bei der verschiedene Prozesse gleichzeitig ablaufen. Neben monolingualer Textverarbeitung braucht man auch dolmetschspezifische Strategien, die erworben werden müssen. Die Notstrategien werden erst dann angewendet, wenn die Kapazitätsgrenze des Dolmetschers erreicht ist.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.
Allusion
(2023)
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
KoMuX, der Kompositamuster-Explorer, (www.owid.de/plus/komux) ist eine Webanwendung, die es ermöglicht, mehr als 50.000 nominale Komposita des Deutschen gezielt nach abstrakten oder lexikalisch-teilspezifizierten Mustern zu durchsuchen. Unterschiedliche Visualisierungen helfen dabei, Strukturen und Zusammenhänge innerhalb der Ergebnismenge zu erfassen.
The Data Governance Act was proposed in late 2020 as part of the European Strategy for Data, and adopted on 30 May 2022 (as Regulation 2022/868). It will enter into application on 24 September 2023. The Data governance Act is a major development in the legal framework affecting CLARIN and the whole language community. With its new rules on the re-use of data held by the public sector bodies and on the provision of data sharing services, and especially its encouragement of data altruism, the Data Governance Act creates new opportunities and new challenges for CLARIN ERIC. This paper analyses the provisions of the Data Governance Act, and aims at initiating the debate on how they will impact CLARIN and the whole language community.
For many reasons, Mennonite Low German is a language whose documentation and investigation is of great importance for linguistics. To date, most research projects that deal with this language and/ or its speakers have had a relatively narrow focus, with many of the data cited being of limited relevance beyond the projects for which they were collected. In order to create a resource for a broad range of researchers, especially those working on Mennonite Low German, the dataset presented here has been transformed into a structured and searchable corpus that is accessible online. The translations of 46 English, Spanish, or Portuguese stimulus sentences into Mennonite Low German by 321 consultants form the core of the MEND-corpus (Mennonite Low German in North and South America) in the Archive for Spoken German. In addition to describing the origin of this corpus and discussing possibilities and limitations for further research, we discuss the technical structure and search possibilities of the Database for Spoken German. Among other things, this database allows for a structured search of metadata, a context-sensitive token search, and the generation of virtual corpora that can be shared with others. Moreover, thanks to its text-sound alignment, one can easily switch from a particular text section of the corpus to the corresponding audio section. Aside from the desire to equip the reader with the technical knowledge necessary to use this corpus, a further goal of this paper is to demonstrate that the corpus still offers many possibilities for future research.
Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
We present a collection of (currently) about 5.500 commands directed to voice-controlled virtual assistants (VAs) by sixteen initial users of a VA system in their homes. The collection comprises recordings captured by the VA itself and with a conditional voice recorder (CVR) selectively capturing recordings including the VA-directed commands plus some surrounding context. Next to a description of the collection, we present initial findings on the patterns of use of the VA systems during the first weeks after installation, including usage timing, the development of usage frequency, distributions of sentence structures across commands, and (the development of) command success rates. We discuss the advantages and disadvantages of the applied collection-specific recording approach and describe potential research questions that can be investigated in the future, based on the collection, as well as the merit of combining quantitative corpus linguistic approaches with qualitative in-depth analyses of single cases.
Linguistische Studien arbeiten häufig mit einer Differenzierung zwischen gesprochener und geschriebener Sprache bzw. zwischen Kommunikation der Nähe und Distanz. Die Annahme eines Kontinuums zwischen diesen Polen bietet sich für eine Verortung unterschiedlichster Äußerungsformen an, inklusive unkonventioneller Textsorten wie etwa Popsongs. Wir konzipieren, implementieren und evaluieren ein automatisiertes Verfahren, das mithilfe unkorrelierter Entscheidungsbäume entsprechende Vorhersagen auf Textebene durchführt. Für die Identifizierung der Pole definieren wir einen Merkmalskatalog aus Sprachphänomenen, die als Markierer für Nähe/Mündlichkeit bzw. Distanz/Schriftlichkeit diskutiert werden, und wenden diesen auf prototypische Nähe-/Mündlichkeitstexte sowie prototypische Distanz-/Schrifttexte an. Basierend auf der sehr guten Klassifikationsgüte verorten wir anschließend eine Reihe weiterer Textsorten mithilfe der trainierten Klassifikatoren. Dabei erscheinen Popsongs als „mittige Textsorte“, die linguistisch motivierte Merkmale unterschiedlicher Kontinuumsstufen vereint. Weiterhin weisen wir nach, dass unsere Modelle mündlich kommunizierte, aber vorab oder nachträglich verschriftlichte Äußerungen wie Reden oder Interviews vollkommen anders verorten als prototypische Gesprächsdaten und decken Klassifikationsunterschiede für Social-Media-Varianten auf. Ziel ist dabei nicht eine systematisch-verbindliche Einordung im Kontinuum, sondern eine empirische Annäherung an die Frage, welche maschinell vergleichsweise einfach bestimmbaren Merkmale („shallow features“) nachweisbar Einfluss auf die Verortung haben.
"Das im Januar 2022 gestartete Projekt "Sprachanfragen" (https://www.ids-mannheim.de/gra/projekte2/sprachanfragen/) verfolgt erstmalig das Ziel, Sprachanfragedaten zu erfassen, aufzubereiten und ein wissenschaftsöffentliches Monitorkorpus aus ihnen zu erstellen. Dazukommend wird eine Rechercheschnittstelle entwickelt, mit der die Sprachanfragen systematisch wissenschaftlich analysierbar gemacht werden. Das Poster gibt einen Überblick über das Projekt, zeigt erste Ergebnisse und bietet einen Ausblick auf Überlegungen zur Konzeption eines Chatbots zur automatisierten Beantwortung von Sprachanfragen." Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.
This article investigates mundane photo taking practices with personal mobile devices in the co-presence of others, as well as “divergent” self-initiated smartphone use, thereby exploring the impact of everyday technologies on social interaction. Utilizing multimodal conversation analysis, we examined sequences in which young adults take pictures of food and drinks in restaurants and cafés. Although everyday interactions are abundant in opportunities for accomplishing food photography as a side activity, our data show that taking pictures is also often prioritized over other activities. Through a detailed sequential analysis of video recordings and dynamic screen captures of mobile devices, we illustrate how photographers orient to the momentary opportunities for and relevance of photo taking, that is, how they systematically organize their photographing with respect to the ongoing social encounter and the (projected) changes in the material environment. We investigate how the participants multimodally negotiate the “mainness” and “sideness” (Mondada, 2014) of situated food photography and describe some particular features of participants’ conduct in moments of mundane multiactivity.
Seit Mitte der 1990er Jahre wird am Institut für deutsche Sprache (IDS) in Mannheim erforscht, wie der hochkomplexe Gegenstandsbereich „Grammatik“ unter Ausnutzung hypertextueller Navigationsstrukturen wissenschaftlich fundiert und anschaulich vermittelt werden kann. Eine zentrale Bedeutung kommt folglich einer konsistenten, theorieübergreifenden Vernetzung sämtlicher Textinhalte zu. Um eine automatisierbare Bezugnahme zwischen mit unterschiedlichem terminologischem Vokabular formulierten, aber das gleiche sprachliche Phänomen beschreibenden Inhalten zu befördern, bildet eine onomasiologisch konzipierte Terminologiedatenbank das Rückgrat des Online-Systems. Der Beitrag beschreibt Konzeption und Aufbau der skizzierten linguistischen Fachterminologie.
Das Ziel des Beitrages ist es, das Schweigen und seine sprachliche Gestaltung in Bezug auf die Makro- und Mikrostruktur des literarischen Textes zu erforschen. Den theoretischen Hintergrund bilden linguistische und literaturwissenschaftliche Arbeiten, die kommunikative, pragmatische, semantische, kulturelle sowie literaturhistorische Aspekte des Schweigens behandeln und seine Abgrenzung von der Stille hervorheben, die als Naturphänomen zu verstehen ist. Hingewiesen wird ausgehend vom Modell der literarischen Kommunikation auf die Rolle des Schweigens in der Triade Autor-Text-Leser sowie auf seine Realisierungsmöglichkeiten in der Struktur und Sprache des Erzähltextes. Dabei richtet sich die Aufmerksamkeit nicht nur auf das Schweigen als Nicht-Sprechen, sondern auch auf die nichtssagende Rede, die im Rahmen der Kommunikationssituation die Semantik des Schweigens aktualisiert. Die zwei gegensätzlichen Schweigeformen kommen in den Berliner Romanen von Robert Walser (1878-1956) zum Vorschein und unterliegen der genauen Analyse aus der Perspektive der Makro- und Mikrostilistik. Untersucht werden das Erzählprinzip der Geschwätzigkeit in Geschwister Tanner (1907), die Ironie in Der Gehülfe (1908) und die fragmentarische Erzählweise in Jakob von Gunten (1909), durch die das Schweigen sowohl auf der thematischen Ebene als auch in der Struktur und Sprache des Textes realisiert wird. Als narrative Strategie beeinflusst Schweigen die Form und den Inhalt Walsers Berliner Romane und erzielt somit die vom Autor gewünschte Wirkung auf den Leser.
OWID und OWIDplus – lexikographisch-lexikologische Online-Informationssysteme des IDS Mannheim
(2023)
Lexikographische und lexikalische Ressourcen zum Deutschen werden an vielen unterschiedlichen Institutionen erarbeitet, z. B. an Akademien der Wissenschaften oder in privatwirtschaftlichen Verlagen. Auch am Leibniz-Institut für Deutsche Sprache (IDS) in Mannheim werden solche Materialien erstellt und der (Fach-)Öffentlichkeit unter dem Dach von OWID, dem „Online-Wortschatz-Informationssystem Deutsch“ (owid.de), präsentiert.
The special issue opens up a construction-grammatical perspective on (German) word formation phenomena and goes back to a DFG-funded conference of the same name, which we held at the University of Düsseldorf in December 2020. The aim is to bundle up for the first time research from the field of German linguistics that is oriented towards construction grammar, and thus to lay the foundation for a 'Construction Word Formation' (cf. Booij 2010) also in the German-speaking world. Furthermore, ‘Construction Word Formation’ as a discipline shall hereby be sharpened. In this context, construction grammar should not be seen as a radical alternative to traditional word formation approaches that completely reinvents the wheel, but rather as a further development that builds on traditional concepts such as the pattern term with prominent consideration of usage-based aspects.
The Encyclopedia of Terminology for Conversation Analysis and Interactional Linguistics is an online resource for students and scholars of CA/IL, publicly available on the EMCA Wiki page. Encyclopedias and glossaries are widespread across various fields and methods, and serve as immensely valuable resources. Given the extent to which the EMCA/IL community has expanded over the years—both terminologically as well as geographically—we hope that this encyclopedia of terminology will be well received by students and practitioners of CA and IL across the globe.
This paper presents an extended annotation and analysis of interpretative reply relations focusing on a comparison of reply relation types and targets between conflictual pages and neutral pages of German Wikipedia (WP) talk pages. We briefly present the different categories identified for interpretative reply relations to analyze the relationship between WP postings as well as linguistic cues for each category. We investigate referencing strategies of WP authors in discussion page postings, illustrated by means of reply relation types and targets taking into account the degree of disagreement displayed on a WP talk page. We provide richly annotated data that can be used for further analyses such as the identification of interactional relations on higher levels, or for training tasks in machine learning algorithms.
Telephone-based remote interpreting has come into widespread use in multilingual encounters, all the more so in times of refugee crises and the large influx of asylum-seekers into Europe. Nevertheless, the linguistic practices in this mode of communication have not yet been examined comprehensively. This article therefore investigates selected aspects of turn-taking and clarification sequences during semi-authentic telephone-interpreted counselling sessions for refugees (Arabic–German). A quantitative analysis reveals that limited audibility makes it more difficult for interpreters to claim their turn successfully; in most cases, however, turn-taking occurs smoothly. The trouble sources that trigger queries are mainly content-related and interpreters vary greatly in the ways they deal with such difficulties. Contrary to what one might expect, the study shows that coordination fails only rarely during telephone-based remote interpreting.
The proposed contribution will shed light on current and future challenges on legal and ethical questions in research data infrastructures. The authors of the proposal will present the work of NFDI’s section on Ethical, Legal and Social Aspects (hereinafter: ELSA), whose aim is to facilitate cross-disciplinary cooperation between the NFDI consortia in the relevant areas of management and re-use of research data.
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.
Dieser Band fasst die Vorträge des 9. Hildesheimer Evaluierungs- und Retrieval-Workshops (HIER) zusammen, der am 9. und 10. Juli 2015 an der Universität Hildesheim stattfand. Die HIER Workshop-Reihe begann im Jahr 2001 mit dem Ziel, die Forschungsergebnisse der Hildesheimer Informationswissenschaft zu präsentieren und zu diskutieren. Mittlerweile nehmen immer wieder Kooperationspartner von anderen Institutionen teil, was wir sehr begrüßen. HIER schafft auch ein Forum für Systemvorstellungen und praxisorientierte Beiträge.
Open Science and language data: Expectations vs. reality. The role of research data infrastructures
(2023)
Language data are essential for any scientific endeavor. However, unlike numerical data, language data are often protected by copyright, as they easily meet the threshold of originality. The role of research infrastructures (such CLARIN, DARIAH, and Text+) is to bridge the gap between uses allowed by statutory exceptions and the requirements of Open Science. This is achieved on the one hand by sharing language data produced by research organisations with the widest possible circle of persons, and on the other by mutualizing efforts towards copyright clearance and appropriate licensing of datasets.
Corpus-based identification and disambiguation of reading indicators for German nominalizations
(2010)
Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.
This paper analyses intensification in German digitally-mediated communication (DMC) using a corpus of YouTube comments written by young people (the NottDeuYTSch corpus). Research on intensification in written language has traditionally focused on two grammatical aspects: syntactic intensification, i.e. the use of particles and other lexical items and morphological intensification, i.e. the use of compounding. Using a wide variety og examples from the corpus, the paper identifies novel ways that have been used for intensification in DMC, and suggests a new taxonomy of classification for future analysis of intensification.
Oralität ist gegenüber Literalität historisch primär, und der Übergang hin zur Literalität ist sprach- wie kulturwissenschaftlich einschneidend. Unserdeutsch (Rabaul Creole German), eine erst knapp über 100 Jahre junge, originär ausschließlich mündlich verwendete Kreolsprache, befindet sich gegenwärtig an der Schwelle hin zur Verschriftung. Eine Sammlung von rund 180 spontan schriftlich produzierten Äußerungen dieser noch auf allen Ebenen unnormierten Sprache zeigt von den Unserdeutsch-SchreiberInnen intuitiv zugrunde gelegte Graphem-Phonem-Korrespondenzen. Die Schriftbelege lassen dabei Rückschlüsse zu auf graphematische Kontakteinflüsse sowie auf die mentale Repräsentation von Wörtern bei den SprecherInnen. Diese Erkenntnisse sind, neben ihrer sprachtheoretischen Relevanz, vor allem auch für die noch ausstehende Erarbeitung einer Orthographie von Unserdeutsch von Bedeutung.
Der Beitrag beschreibt einen spezifisch diskurslinguistischen Zugang zu der sprachgeschichtlichen Frage nach durch gesellschaftlich-politische Faktoren hervorgerufenen Umbrüchen. Orientiert an den Foucaultschen Kategorien der Serialität und der Diskontinuität werden diese methodischen Implikaturen auf die Umbrüche 1918/19 und 1945ff bezogen. Das Methodenmodell besteht im Wesentlichen aus zwei Aspekten: Als Faktor von hoher Umbruchrelevanz wird zum einen der soziopragmatische Bezug zu Diskursakteuren hergestellt. Exemplarisch werden zum andern diese Epochen kennzeichnende demokratiegeschichtliche Institutionalisierungsakte im Sinne Searles beschrieben. Damit wird ein Beitrag zur diskurslinguistischen Methodenreflexion geleistet.
Der vorliegende Aufsatz widmet sich zwei Kategorien der traditionellen (deutschen) Grammatik: dem Aufforderungssatz, einer der fünf klassischen Satzarten, und dem Imperativ, einer Verbform, die als typisch für Aufforderungssätze gilt. Er greift Beobachtungen aus der jüngeren Fachliteratur auf, die ein zunehmendes Unbehagen mit beiden Kategorien erkennen lassen. In morphologischer Hinsicht zeigt sich, dass nur wenige deutsche Verben eine eindeutige Imperativform besitzen. Manche Verben besitzen keine Imperativform. Bei der Mehrzahl der Verben besteht Homonymie zwischen Imperativformen und Konjunktivformen der 3. Person Singular. Imperativformen werden durch Konjunktivformen verdrängt. In syntaktischer Hinsicht wird argumentiert, dass Imperativsyntagmen keine Satzform haben. Satzförmige Ausdrücke mit Konjunktivformen, die für auffordernde Handlungen stehen, können als Wunschsätze kategorisiert werden. Als Aufforderungssätze bleiben zwei Klassen von Syntagmen im Grenzbereich zwischen nicht-satzförmigen und satzförmigen Ausdrücken übrig, die besondere Eigenschaften hinsichtlich Subjektbesetzung und Subjekt-Verb-Kongruenz zeigen.
Orthographie ist ein Thema, das spätestens seit der Rechtschreibreform 1996 nicht nur die wissenschaftliche Forschung, sondern auch den öffentlichen Diskurs entscheidend geprägt hat. Aus Anlass von „20 Jahren Rat für deutsche Rechtschreibung“ war dieses Thema auch Gegenstand der 59. Jahrestagung des Leibniz-Instituts für Deutsche Sprache.
Als Teil der NFDI vernetzt Text+ ortsverteilt verschiedenste Daten und Dienste für die geisteswissenschaftliche Forschung und stellt sie der wissenschaftlichen Gemeinschaft FAIR zur Verfügung. In diesem Beitrag beschreiben wir die Umsetzung beispielhaft im Bereich der Text+ Datendomäne Sammlungen anhand von Korpora, die in verschiedenen Disziplinen Verwendung finden. Die Infrastruktur ist auf Erweiterbarkeit ausgelegt, so dass auch weitere Ressourcen über Text+ verfügbar gemacht werden können. Enthalten ist auch ein Ausblick auf weitere zu erwartende Entwicklungen. Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.
Hintergrund: Die digitale Transformation prägt gesellschaftliche Systeme weltweit. Digital Health umfasst verschiedene Bereiche, wie z. B. die Verfügbarkeit und Auswertung von Daten, die Möglichkeit der Vernetzung innerhalb der eigenen Berufs- oder Betroffenengruppe und die Art, wie Patient*innen, Angehörige und Behandler*innen miteinander kommunizieren.
Ziel der Arbeit: Digital Health wird mit ihren Auswirkungen auf die Beziehung und die Kommunikation zwischen Patient*innen, Angehörigen und Behandler*innen beleuchtet. Veränderungen, die bereits erkennbar sind, werden beschrieben und Perspektiven aufgezeigt.
Methoden: Das Thema wird aus sozialphilosophischer, sprachwissenschaftlicher und ärztlicher Perspektive in folgenden Bereichen exploriert: digitale vs. analoge Kommunikation, Narration vs. Datensammeln, Internet und soziale Medien als Informationsquelle, Raum für Identitätsbildung und Veränderung der Interaktion von Patient*innen, Angehörigen und Behandler*innen.
Ergebnisse: Die Erweiterung der Interaktion zwischen Patient*innen und Ärzt*innen auf digitale und Präsenzformate sowie die asynchrone und synchrone Kommunikation erhöhen die Komplexität, aber auch die Flexibilität. Die Fokussierung auf „objektive“ Daten kann den Blick auf die Person mit ihrer individuellen Biografie beeinträchtigen, während digitale Räume die Möglichkeiten zur Identitätsbildung aufseiten der Patient*innen und für die Interaktion deutlich erweitern.
Diskussion: Bereits jetzt zeigen sich Vorteile der Digitalisierung (z. B. besseres Selbstmanagement) und Nachteile (Fokussierung auf Daten statt auf die Person). Für den kinder- und jugendärztlichen Bereich bestehen die Notwendigkeiten, professionelle kommunikative Kompetenzen und professionelle Gesundheitskompetenz zu erweitern sowie die Organisation seiner Versorgungseinrichtungen weiterzuentwickeln.
One of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of what we call average prediction complexity. We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. In addition, we show that speaker population size predicts entropy. We argue that both results constitute evidence against the equi-complexity hypothesis from an information-theoretic perspective.
In the context of a Nordic Conference on Bilingualism, it can be a rewarding task to look at issues such as language planning, policy and legislation from a perspective of the southern neighbours of the Nordic world. This paper therefore intends to point attention towards a case of societal multilingualism at the periphery of the Nordic world by dealing with recent developments in language policy and legislation with regard to the North Frisian speech community in the German Land of Schleswig-Holstein. As I will show, it is striking to what degree there are considerable differences in the discourse on minority protection and language legislation between the Nordic countries and a cultural area which may arguably be considered to be part of the Nordic fringe - and which itself occasionally takes Scandinavia as a reference point, e.g. in the recent adoption of a pan-Frisian flag modelled on the Nordic cross (Falkena 2006).
The main focus of the paper will be on the Frisian Act which was passed in the Parliament of Schleswig-Holstein in late 2004. It provides a certain legal basis for some political activities with regard to Frisian, but falls short of creating a true spirit of minority language protection and/or revitalisation. In contrast to the traditions of the German and Danish minorities along the German-Danish border and to minority protection in Northern Scandinavia (in particular to Sámi language rights), the approach chosen in the Frisian Act is extremely weak and has no connotation of long-term oriented language-planning, let alone a rights-based perspective.
The paper will then look at policy developments in the time since the Act was passed, e.g. in the Schleswig-Holstein election campaign in 2005, and on latest perceptions of the Frisian language situation in the discourse on North Frisian Policy in Schleswig-Holstein majority society. In the final part of the paper, I will discuss reasons for the differences in minority language policy discourse between Germany and the Nordic countries, and try to provide an outlook on how Frisian could benefit from its geographic proximity to the Nordic world.
Tollpatschig interviewen oder interviewt werden – Kurzvideos im ukrainischen und deutschen Fernsehen
(2016)
Kurzinterviews im Fernsehen stellen nicht nur für die kontrastive Medienlinguistik, sondern auch für die Gesprächsanalyse, Textsortenlinguistik und Pragmatik einen aufschlussreichen Gegenstand dar, besonders wenn es sich um kommunikative Abweichungen handelt. Der Beitrag stellt die Klassifizierung der Abweichungen bzw. der Deviationen in den Fernsehinterviews in Bezug auf die Kommunikation und die Sprache vor. Dabei werden die Kommunikationsdeviationen vom Standpunkt des Adressanten, des Kommunikationsprozesses, des gegenseitigen Verständnisses und des Adressaten sowie sprachliche Abweichungen betrachtet. Im Beitrag werden gemeinsame und unterschiedliche Merkmale der Deviationen in ukrainischen und deutschen Kurzinterviews im Fernsehen festgestellt, was zur Erarbeitung eines Modells der Deviationen und zu einer tieferen kontrastiven Untersuchung beider Sprachen verhilft.
What is the subject of German linguistics? This seemingly simple question has no obvious answer. In the ZGL’s first issue, the editors required contributions to cover the whole of the German language and to be theoretically sound but application-orientated, whereas the current ZGL-homepage defines the German language of present and history in all its differentiations as its subject matter.
Looking through the fifty volumes of ZGL, three relationships can be identified as presumably enlightening the role of language, in particular the German language: language and mind; language and language use; language and culture. Though of a different systematic type, language and data should be added as an increasingly important pairing for conceptualizing language. On this basis, I also discuss the position of linguistic studies of the German language, mirrored in the ZGL-volumes, between social, cultural and natural sciences, as well as the corresponding epistemic approaches – like explaining vs. understanding.