Refine
Year of publication
Document Type
- Part of a Book (337) (remove)
Is part of the Bibliography
- yes (337) (remove)
Keywords
- Deutsch (119)
- Korpus <Linguistik> (84)
- Gesprochene Sprache (26)
- Nationalsozialismus (20)
- Sprachgebrauch (20)
- Grammatik (19)
- Kommunikation (18)
- Interaktion (16)
- Konversationsanalyse (14)
- Sprachvariante (14)
Publicationstate
- Veröffentlichungsversion (337) (remove)
Reviewstate
- (Verlags)-Lektorat (193)
- Peer-Review (112)
- Verlags-Lektorat (18)
- Peer-review (7)
- Verlagslektorat (3)
- (Verlags)Lektorat (1)
Publisher
- de Gruyter (54)
- De Gruyter (35)
- IDS-Verlag (28)
- Heidelberg University Publishing (23)
- Institut für Deutsche Sprache (22)
- V&R unipress (18)
- European language resources association (ELRA) (11)
- Narr (8)
- Peter Lang (7)
- Verlag für Gesprächsforschung (7)
"Wie Schule Sprache macht"
(2019)
Am Beispiel der polyfunktionalen Mehrworteinheit <was weiß ich> wird das Zusammenspiel von pragmatischer und phonetischer Ausdifferenzierung in Pragmatikalisierungsprozessen untersucht. Hierzu werden spontan-sprachliche Belege aus dem Korpus „Deutsch heute“ analysiert. Die beobachtete phonetische Variationsbreite deutet auf eine komplexe Beziehung zu den jeweiligen pragmatischen Funktionen hin.
A syntax-based scheme for the annotation and segmentation of German spoken language interactions
(2018)
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segmenting have been proposed,
some of which are based on syntax. In this study, we developed and evaluated annotation and segmentation guidelines in reference to the topological field model for German. We can show that these guidelines are used consistently across annotators. We also investigated the influence of various interactional settings with a rather simple measure, the word-count per segment and unit-type. We observed that the word count and the distribution of each unit type differ in varying interactional settings and that our developed segmentation and annotation guidelines are used consistently across annotators. In conclusion, our syntax-based segmentations reflect interactional properties that are intrinsic to the social interactions that participants are involved in. This can be used for further analysis of social interaction and opens the possibility for automatic segmentation of transcripts.
Der nationalsozialistische Interaktions- und Kommunikationsraum war mithin bevölkert von kommunikativ konstruierten Sozialfiguren. Hierbei gab es sowohl positiv Konnotierte (z. B. Volksgenosse, Nationalsozialist, Parteigenosse, SA-Mann, Alter Kämpfer) als auch negativ Konnotierte (z. B. Asozialer, Judenfreund, Schwarzer, Roter, Freimaurer). Diese stereotypisierten Sozialfiguren, an die wiederum vielfältige positive wie negative Attribuierungen geknüpft waren, stellten gleichsam Diskurspositionen dar, die anderen zugewiesen wurden oder eingenommen werden konnten – sofern den individuellen Voraussetzungen nach möglich – und die mit unterschiedlichen Graden der In- bzw. Exklusion einhergingen. Die folgenden Ausführungen konzentrieren sich auf zwei dieser Figuren, die spezifischer als Grenzfiguren begriffen werden können: Meckerer und Märzgefallene. Es wird untersucht, wie diese beiden Grenzfiguren sprachlich konstruiert, in welchen Kontexten und Kommunikationssituationen sie angeeignet und verwendet wurden. In beiden Fällen wird der Fokus dabei über den wörtlichen Ausdruck hinaus auf zeitgenössisch ähnliche oder eng verwandte Bezeichnungen ausgeweitet.
The grammatical information system grammis combines descriptive texts on German grammar with dictionaries of specific word classes and grammatical terminology. In this paper, we describe the first attempts at analyzing user behavior for an online grammar of the German language and the implementation of an analysis and data extraction tool based on Matomo, a web analytics tool. We focus on the analysis of the keywords the users search for, either within grammis or via an external search platform like Google, and the analysis of the interaction between the text components within grammis and the integrated dictionaries. The overall results show that about 50% of the searches are for grammatical terms, and that the users shift from texts to dictionaries, mainly by using the integrated links to the dictionary of terminology within the texts. Based on these findings, we aim to improve grammis by extending its integrated dictionaries.
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before its republication. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality assessment are discussed. Finally, pseudonymisation as an alternative to categorisation as a method of the anonymisation of CMC data is discussed, as well as possibilities of an automatisation of the process.
It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
Der Beitrag präsentiert Ergebnisse des Projekts „Deutsch im Beruf: Die sprachlich-kommunikative Integration der Flüchtlinge“, das am Leibniz-Institut für Deutsche Sprache (IDS) durchgeführt wird. Im ersten Teil wird auf die zweistufige Sprachstandserhebung in den allgemeinen Integrationskursen eingegangen, die zusammen mit dem Goethe-Institut umgesetzt wurde. Bei der ersten Erhebung zu Beginn der Kurse wurden mit einer Tabletumfrage die Sozialdaten und Sprachenbiografien der Teilnehmenden erhoben. Bei der zweiten Erhebung am Ende der gleichen Kurse ging es darum, mit Hilfe der Analyse von Sprachaufnahmen das erreichte mündliche Kompetenzniveau der Teilnehmenden zu ermitteln. Im zweiten Teil des Beitrags stellen wir Ergebnisse unserer ethnografisch-gesprächsanalytischen Feldstudien vor, die wir in verschiedenen Arbeitskontexten wie Qualifizierungsmaßnahmen, duale Berufsausbildung und betriebliche Praktika durchgeführt haben. In Bezug auf die zentralen Fragen zu gegenseitiger Verständigung und der Sprachvermittlung am Arbeitsplatz konnten wir im Rahmen unserer Ethnografien drei prototypische Praktiken feststellen, auf die wir näher eingehen: a) „kaum Verständnissicherung und Sprachvermittlung“, b) „ad-hoc Verständnissicherung und Sprachvermittlung“ und c) „systematische Verständnissicherung und Sprachvermittlung“. Des Weiteren fokussieren wir im letzten Teil des Beitrags die Ergebnisse unserer ethnografischen Langzeitstudie zu Betriebspraktika von studierenden Geflüchteten. Anhand der Untersuchung von Reparaturen zeigt sich hier die Entwicklung der interaktionalen Kompetenz eines L2-Sprechers, die mit einer zunehmenden kommunikativen Integration in Teamgesprächen einhergeht.
Dieser Beitrag gibt einen Überblick über die methodischen Ausgangspunkte des Projekts MIT. Qualität und stellt einige zentrale Erkenntnisse zur Modellbildung, der korpuslinguistischen Analyse und Akzeptabilitätserhebungen in der Sprachgemeinschaft vor. Wir zeigen dabei, wie bestehende Textqualitätsmodelle anhand einer Analyse einschlägiger Ratgeberliteratur erweitert werden können. Es wurden zwei empirische Fallstudien durchgeführt, die beide auf die Herstellung von textueller Kohärenz mittels des Kausalkonnektors weil fokussieren. Wir stellen zunächst eine korpuskontrastive Analyse vor. Weiterhin zeigen wir, wie man anhand verschiedener Aufgabenstellungen diverse Aspekte von Akzeptabilität in der Sprachgemeinschaft abprüfen kann.
We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.
In this paper we use methods for creating a large lexicon of verbal polarity shifters and apply them to German. Polarity shifters are content words that can move the polarity of a phrase towards its opposite, such as the verb “abandon” in “abandon all hope”. This is similar to how negation words like “not” can influence polarity. Both shifters and negation are required for high precision sentiment analysis. Lists of negation words are available for many languages, but the only language for which a sizable lexicon of verbal polarity shifters exists is English. This lexicon was created by bootstrapping a sample of annotated verbs with a supervised classifier that uses a set of data- and resource-driven features. We reproduce and adapt this approach to create a German lexicon of verbal polarity shifters. Thereby, we confirm that the approach works for multiple languages. We further improve classification by leveraging cross-lingual information from the English shifter lexicon. Using this improved approach, we bootstrap a large number of German verbal polarity shifters, reducing the annotation effort drastically. The resulting German lexicon of verbal polarity shifters is made publicly available.
This chapter focuses on the formation of adverbs from a corpuslinguistic perspective, providing an overview of adverb formation patterns in German that includes frequencies and hints to productivity as well as combining quantitative methods and theoretically founded hypotheses to address questions that concern possible grammaticalization paths in domains that are formally marked by prepositional elements or inflectional morphology (in particular, superlative or superlative-derived forms). Within our collection of adverb types from the project corpus, special attention is paid to adverbs built from primary prepositions. The data suggest that generally, such adverb formation involves the saturation of the internal argument slot of the relation-denoting preposition. In morphologically regular formations with the preposition in final position, pronominal forms like da ‘there’, hier ‘here’, wo ‘where’ as well as hin ‘hither’ and her ‘thither’ serve to derive adverbs. On the other hand, morphologically irregular formations with the preposition – in particular: zu ‘to’ or vor ‘before, in front of’ – in initial posi-tion show traits of syntactic origin such as (remnants of) inflectional morphology. The pertaining adverb type dominantly saturates the internal argument slot by means of universal quantification that is part and parcel as well of the derivation of superlatives and demonstrably fuels the productivity of the pertaining formation pattern.
In this paper, we present the concept and the results of two studies addressing (potential) users of monolingual German online dictionaries, such as www.elexiko.de. Drawing on the example of elexiko, the aim of those studies was to collect empirical data on possible extensions of the content of monolingual online dictionaries, e.g. the search function, to evaluate how users comprehend the terminology of the user interface, to find out which types of information are expected to be included in each specific lexicographic module and to investigate general questions regarding the function and reception of examples illustrating the use of a word. The design and distribution of the surveys is comparable to the studies described in the chapters 5-8 of this volume. We also explain, how the data obtained in our studies were used for further improvement of the elexiko-dictionary.
Wie nun bereits seit einigen Jahren üblich, wurde die IDS-Jahrestagung auch dieses Jahr wieder von einer Methodenmesse begleitet, auf der sich passend zum Tagungsthema anwendungsorientierte Projekte mit Bezug zur Lexikonforschung präsentierten. Die Bandbreite der dargebotenen Themen war sehr groß: innovative methodische Ansätze im Bereich der Translationswissenschaft, Tools zur Analyse und Beschreibung lexikalischer Muster oder zur Detektion von Neologismen, neue lexikografische Ressourcen bis hin zu Infrastrukturaktivitäten und einem Kooperationsprojekt zwischen Schüler/innen und Wissenschaftler/innen zur Wortschatzanalyse. Im Folgenden sollen die einzelnen Projekte, die sich auf der Messe präsentiert haben, auf der Basis der eingereichten Abstracts der Messeteilnehmer/innen kurz vorgestellt werden.
The present submission reports on a pilot project conducted at the Institute for the German Language (IDS), aiming at strengthening the connection between ISO TC37SC4 “Language Resource Management” and the CLARIN infrastructure. In terminology management, attempts have recently been made to use graph-theoretical analyses to get a better understanding of the structure of terminology resources. The project described here aims at applying some of these methods to potentially incomplete concept fields produced over years by numerous researchers serving as experts and editors of ISO standards. The main results of the project are twofold. On the one hand, they comprise concept networks dynamically generated from a relational database and browsable by the user. On the other, the project has yielded significant qualitative feedback that will be offered to ISO. We provide the institutional context of this endeavour, its theoretical background, and an overview of data preparation and tools used. Finally, we discuss the results and illustrate some of them.
Brief
(2022)
Der folgende Beitrag untersucht Briefe aus der Zeitspanne des Nationalsozialismus, die von unterschiedlichen Akteur*innen in unterschiedlichen Beteiligungsrollen verfasst worden sind. Es handelt sich um von Soldaten und ihren Angehörigen verfasste Feldpost-, um von Gegner*innen des Nationalsozialismus geschriebene Haftbriefe sowie um Eingaben an Staats- und Parteiinstanzen, die Teil des institutionellen Briefverkehrs sind. Alle diese Formen des Briefschreibens besitzen eine längere Tradition. Ihre Nutzung während der NS-Zeit ist jedoch durch spezifische Ausprägungen gekennzeichnet, die in den jeweiligen Abschnitten beleuchtet werden.
German is a language with complex morphological processes. Its long and often ambiguous word forms present a bottleneck problem in natural language processing. As a step towards morphological analyses of high quality, this paper introduces a morphological treebank for German. It is derived from the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished, modernized and partially revised version. The derivation of the morphological trees is not trivial, especially for such cases of conversions which are morpho-semantically opaque and merely of diachronic interest. We develop solutions and present exemplary analyses. The resulting database comprises about 40,000 morphological trees of a German base vocabulary whose format and grade of detail can be chosen according to the requirements of the applications. The Perl scripts for the generation of the treebank are publicly available on github. In our discussion, we show some future directions for morphological treebanks. In particular, we aim at the combination with other reliable lexical resources such as GermaNet.
Enabling appropriate access to linguistic research data, both for many researchers and for innovative research applications, is a challenging task. In this chapter, we describe how we address this challenge in the context of the German Reference Corpus DeReKo and the corpus analysis platform KorAP. The core of our approach, which is based on and tightly integrated into the CLARIN infrastructure, is to offer access at different levels. The graduated access levels make it possible to find a low-loss compromise between the possibilities opened up and the costs incurred by users and providers for each individual use case, so that, viewed over many applications, the ratio between effort and results achieved can be effectively optimized. We also report on experiences with the current state of this approach.
Formal learning in higher education creates its own challenges for didactics, teaching, technology, and organization. The growing need for well-educated employees requires new ideas and tools in education. Within the ROLE project, three personal learning environments based on ROLE technology were used to accompany “traditional” teaching and learning activities at universities. The test beds at the RWTH Aachen University in Germany, the School of Continuing Education of Shanghai Jiao Tong University in China, and the Uppsala University in Sweden differ in learning culture, the number of students and their individual background, synchronous versus distant learning, etc. The big range of test beds underlines the flexibility of ROLE technology. For each test bed, the learning scenario is presented and analyzed as well as the particular ROLE learning environment. The evaluation methods are described and the research results discussed in detail. The learned lessons provide an easy way to benefit from the ROLE research work which demonstrates the potential for new ideas based on flexible e-learning concepts and tools in “traditional” education.
We present web services which implement a workflow for transcripts of spoken language following the TEI guidelines, in particular ISO 24624:2016 “Language resource management – Transcription of spoken language”. The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
CMC Corpora in DeReKo
(2017)
We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.
Many studies on dictionary use presuppose that users do indeed consult lexicographic resources. However, little is known about what users actually do when they try to solve language problems on their own. We present an observation study where learners of German were allowed to browse the web freely while correcting erroneous German sentences. In this paper, we are focusing on the multi-methodological approach of the study, especially the interplay between quantitative and qualitative approaches. In one example study, we will show how the analysis of verbal protocols, the correction task and the screen recordings can reveal the effects of intuition, language (learning) awareness, and determination on the accuracy of the corrections. In another example study, we will show how preconceived hypotheses about the problem at hand might hinder participants from arriving at the correct solution.
This paper discusses changes in lexicographic traditions with respect to contrastive dictionary entries and dynamic, on-demand e-lexicographic descriptions. The new German online dictionary Paronyme - Dyna- misch im Kontrast is concerned with easily confused words (paronyms), such as effektivtefficient and sensibel/ sensitiv. New approaches to the empirical analysis and lexicographic presentation of words such as these are required, and this dictionary is committed to overcoming the discrepancy between traditional practice and insights from language use. As a corpus-guided reference work, it strives to adequately reflect not only authentic use in situations of actual communication, but also cognitive ideas such as conceptual structure, categorization and knowledge. Looking up easily confused lexical items requires contrastive entries where users can instantly compare meaning, contexts and reference. Adaptable access to lexicographic details and variable search options offer different foci and perspectives on linguistic information, and authentic examples reflect prototypical structures. These are essential in order to meet all the different interests of users. This paper will illustrate the contrastive structure of the new e-dictionary and demonstrate which information can be compared. It also focusses on various dynamic modes of dictionary consultation, which enable users to shift perspectives on paronyms accordingly.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
Contrastive analysis of climate-related neologisms registered in GermanN and French Wikipedia
(2023)
Neologisms represent new social norms, tendencies, controversies and attitudes. They denote new or changed concepts which are constantly being negotiated between different members of the discourse community (Wodak 2022 and Catalano/Waugh (eds.) 2020). Neologisms help to identify new communicative patterns and narratives which illustrate different strings of discourse in everyday life. In recent years, many neologisms relating to the subject of the environment and climate have been emerging around the world mainly due to dominant discussions on climate change and the movement “Fridays for Future”. In German, for example, neologisms such as Klimakleber, klimaresilient and globaler Streik and in French neologisms such as éco-anxiété, justice climatique and écocitoyen could be observed. These neologisms occur in many domains of life, for example in politics, media and also in advertising, which means that “l’importance croissante des enjeux environnementaux dans les discours politiques, médiatiques et publicitaires” (Balnat/Gérard 2022, p. 22) can be identified. However, it is not only the occurrence of environment- or climate-related topics that is increasing, but also the rising polarisation of the public debate. The polarisation within public discourse is based on the fact that there are opposing positions which are represented by new or recently relevant terms such as activistes du climat (or Klimaaktivisten) and climatosceptiques (or Klimaskeptiker) (Balnat/Gérard 2022, p. 22). Due to different identifications with one or the other side, one can also speak of an “affrontement idéologique” (Balnat/Gérard 2022, p. 23). 1 The explosive nature and the high complexity of the debate on climate and the environmental issues mean that many words are naturally unfamiliar to people. This is especially true with regard to neologisms. In addition, it is often not only the new word itself but also the signified concept that is initially unknown. When people then look up words, they often do so on the Internet. Wikipedia as a “free encyclopedia” (Wikipedia 2023) is particularly well suited as an object of study with regard to neologisms, since factual knowledge is given special attention there. Furthermore, this reference guide is perceived as a regular source of agreed and common knowledge on all sorts of subjects. Hence, the descriptions found here represent social agreement on controversial terms and discussions to some degree. In this paper, German and French neologisms from the subject area of climate and environment will be examined primarily in Wikipedia, but also in the neighbouring resource Wiktionary,2 which is “a collaborative project to produce a free-content multilingual dictionary” (Wiktionary 2023). Since Wikipedia and Wiktionary are available in French and in German, 21010. International Contrastive Linguistics Conference (ICLC) both are equally suitable for the contrastive analysis. Thus, Wikipedia articles which are accessible in both languages (e.g. Klimanotstand and État d›urgence climatique) or Wikipedia articles about similar events and phenomena (e.g. Letzte Generation and Dernière Rénovation) will be compared. For example, we will have a closer look at other new terms specifying different thematic aspects of the discourse of climate and environment. We will mainly refer to those lexical items which can be found in the respective articles in both languages. Special emphasis will be on overlaps and differences, thematic foci, speaker’s positions and evaluative terms.
In my talk, I present an empirical approach to detecting and describing proverbs as frozen sentences with specific functions in current language use. We have developed this approach in the EU project ‘SprichWort’ (based on the German Reference Corpus). The first chapter illustrates selected aspects of our complex, iterative procedure to validate proverb candidates. Based on our corpus-driven lexpan methodology of slot analysis I then discuss semantic restrictions of proverb patterns. Furthermore, I show different degrees of proverb quality ranging from genuine proverbs to non-proverb realizations of the same abstract pattern. On the one hand, the corpus validation reveals that proverbs are definitely perceived and used as relatively fixed entities and often as sentences. On the other hand, proverbs are not only interpreted as an interesting unique phenomenon but also as part of the whole lexicon, embedded in networks of different lexical items.
Except for some recent advances in spoken language lexicography (cf. Verdonik & Sepesy Maučec 2017, Hansen & Hansen 2012, Siepmann 2015), traditional lexicographic work is mainly oriented towards the written language. In this paper, we describe a method we used to identify relevant headword candidates for a lexicographic resource for spoken language that is currently being developed at the Institute for the German Language (IDS, Mannheim). We describe the challenges of the headword selection for a dictionary of spoken language, and having made considerations regarding our headword concept, we present the corpus-based procedures that we used in order to facilitate the headword selection. After presenting the results regarding the selection of one-word lemmas, we discuss the opportunities and limitations of our approach.
Das 1901er-Regelwerk wird in einem direkten Vergleich mit dem geltenden amtlichen Regelwerk gemeinhin als defizitär eingestuft. Diese Einschätzung basiert auf der Annahme eines Primats des Regelteils. Der vorliegende Beitrag setzt hieran an und bestimmt auf der Basis der Festlegungen zur Getrennt- und Zusammenschreibung Funktion und Verhältnis von Regelteil und Wörterverzeichnis des ersten gesamtdeutschen Regelwerks in seinem historischen Entstehungskontext. Dabei zeigt sich, dass das Regelwerk von 1901 einen anderen Weg in der Kodifikation beschreitet; während im Regelteil Regularitäten aufgezeigt und Kriterien zur Schreibungsfindung an die Hand gegeben werden, erfolgt die Kodifikation rechtschreibschwieriger Fälle über das Wörterverzeichnis.
Das Archiv für Gesprochenes Deutsch (AGD, Stift/Schmidt 2014) am Leibniz-Institut für Deutsche Sprache ist ein Forschungsdatenzentrum für Korpora des gesprochenen Deutsch. Gegründet als Deutsches Spracharchiv (DSAv) im Jahre 1932 hat es über Eigenprojekte, Kooperationen und Übernahmen von Daten aus abgeschlossenen Forschungsprojekten einen Bestand von bald 100 Variations-, Interview- und Gesprächskorpora aufgebaut, die u. a. dialektalen Sprachgebrauch, mündliche Kommunikationsformen oder die Sprachverwendung bestimmter Sprechertypen oder zu bestimmten Themen dokumentieren. Heute ist dieser Bestand fast vollständig digitalisiert und wird zu einem großen Teil der wissenschaftlichen Gemeinschaft über die Datenbank für Gesprochenes Deutsch (DGD) im Internet zur Nutzung in Forschung und Lehre angeboten.
Das deutsche Wort Frühstück
(2018)
For many reasons, Mennonite Low German is a language whose documentation and investigation is of great importance for linguistics. To date, most research projects that deal with this language and/ or its speakers have had a relatively narrow focus, with many of the data cited being of limited relevance beyond the projects for which they were collected. In order to create a resource for a broad range of researchers, especially those working on Mennonite Low German, the dataset presented here has been transformed into a structured and searchable corpus that is accessible online. The translations of 46 English, Spanish, or Portuguese stimulus sentences into Mennonite Low German by 321 consultants form the core of the MEND-corpus (Mennonite Low German in North and South America) in the Archive for Spoken German. In addition to describing the origin of this corpus and discussing possibilities and limitations for further research, we discuss the technical structure and search possibilities of the Database for Spoken German. Among other things, this database allows for a structured search of metadata, a context-sensitive token search, and the generation of virtual corpora that can be shared with others. Moreover, thanks to its text-sound alignment, one can easily switch from a particular text section of the corpus to the corresponding audio section. Aside from the desire to equip the reader with the technical knowledge necessary to use this corpus, a further goal of this paper is to demonstrate that the corpus still offers many possibilities for future research.
Das Konzept De-facto-Didaktik ist der theoretische Rahmen, in dem wir aus multimodal-interaktionsanalytischer Sicht Unterrichtskommunikation analysieren. Es integriert neue Entwicklungen im Bereich Interaktionstheorie, empirische Interaktionsanalyse und Raumlinguistik. Aus einer dezidiert interaktionistischen Perspektive fokussiert das Konzept zunächst bewusst allgemeine Anforderungen der Interaktionskonstitution, um spezifische Aspekte der Unterrichtskommunikation - im konkreten Fall primär das didaktische Handeln der Lehrer - neu perspektivieren zu können. Wie immer man das Geschehen im Unterricht auch konzeptualisieren mag, es ist und bleibt in seiner grundlegenden Struktur und - jenseits seiner institutionellen Prägung und Bedingtheit - ein Ereignis, das in der konkreten Interaktionsarchitektur des Klassenraums, sequenziell-simultan durch das multimodale Verhalten aller Anwesenden gemeinsam hervorgebracht wird. Dabei unterliegen alle Beteiligten ungeachtet ihrer besonderen Beteiligungsrolle den Bedingungen der Interaktionskonstitution.
Wir werden nachfolgend die interaktionstheoretischen Grundlagen skizzieren, auf der unsere Methode der de-facto-didaktisehen Analyse basiert, und führen dann an einem ausgewählten Beispiel vor, wodurch sich dieser analytische Zugang auszeichnet. Zum Abschluss weisen wir nach einem fallspezifischen Resümee auf die anwendungsbezogene Relevanz de-facto-didaktischer Analysen hin.
In diesem Beitrag wird an einigen Beispielen aus der nominalen Morphologie bzw. der Morphosyntax der deutschen Substantivgruppe gezeigt, wie sich in den Veränderungen in diesem Bereich, die sich über das 20. Jahrhundert hin beobachten lassen, Fragen eines langfristigen Systemwandels mit Regularitäten des Sprachgebrauchs überlagern. Im Mittelpunkt soll die Frage der Markierung der Kasus – insbesondere in den allgemein als „kritisch“ angesehenen Fällen von Genitiv und Dativ – stehen. Wenn man die Daten dazu betrachtet, sieht man, dass in den meisten Fällen schon zum Anfang des 20. Jahrhunderts eine weitgehende Anpassung an die Regularitäten der Monoflexion erfolgt war, auch, dass dieser Prozess über das Jahrhundert hin fortschreitet. Bemerkenswert ist, dass insgesamt die als „alt“ angesehenen Fälle in den untersuchten Korpora geschriebener Sprache (sehr) selten auftauchen, dass aber in zunehmendem Ausmaß die daraus folgende Markiertheit in der einen oder anderen Weise funktional genutzt wird. Einen Fall eigener Art stellt in diesem Zusammenhang der Genitiv dar, der sich bei den starken Maskulina und Neutra bekanntlich dem Trend zur „Einmalmarkierung“ der Kasus an den flektierten, das Substantiv begleitenden Elementen widersetzt. Das führt zu der bekannten Orientierung dieser Formen auf die Nicht-Objekt-Verwendungen und auch zu einem auffälligen Maß an Variation in der Nutzung der entsprechenden Flexionsformen.
We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.
Dieses Kapitel gibt einen Überblick über das Inventar der Ausdrücke, die zur Kategorie der Determinierer gezählt werden bzw. zumindest als Kandidaten für diese Kategorie gehandelt werden. Es untersucht ihre grammatischen Eigenschaften und überprüft ihren Determiniererstatus anhand einschlägiger morpho-syntaktischer Kriterien.
Im Beitrag werden drei sprachwissenschaftliche Zugänge zu Diagnosen vorgestellt: In der Gesprächsanalyse wird die Diagnoseherstellung in der mündlichen Arzt-Patienten-Interaktion beleuchtet. Diagnosen entstehen kollaborativ,indem Gesprächsphasen durchlaufen und charakteristische Handlungen in bestimmten Äußerungsformaten vollzogen werden. Im Blickpunkt der Text- und Kommunikationsgeschichte steht hingegen das schriftsprachliche Handeln. Das Herstellen einer Diagnose erfordert hier die nachträgliche Bearbeitung vorgängiger mündlicher Interaktionen gemäß einer etablierten Textsorte: dem Erhebungsbogen. Von diesen Formen der Diagnoseherstellung unterscheidet sich, wie ein diskurslinguistischer Zugriff zeigt, die massenmediale Faktizitätsherstellung in Diskursen wie dem Impfdiskurs, die auch für ein medizinisches Laienpublikum relevant sind. Mit dem Beitrag soll nicht nur deutlich gemacht werden, in welchengem Zusammenhang mündliche Interaktion und schriftliche Fixierung stehen, sondern auch betont werden, dass das massenmedial vermittelte medizinische Lai*innen in relative Expert*innen verwandeln kann.