L3: Lexik empirisch und digital
Refine
Document Type
- Article (32)
- Part of a Book (25)
- Other (7)
- Preprint (4)
- Conference Proceeding (2)
- Doctoral Thesis (2)
- Book (1)
- Part of Periodical (1)
- Working Paper (1)
Keywords
- Korpus <Linguistik> (32)
- Deutsch (29)
- Wortschatz (16)
- COVID-19 (13)
- Geschlechtergerechte Sprache (10)
- Lexikostatistik (9)
- Online-Medien (9)
- Wörterbuch (9)
- Vielfalt (8)
- Datenanalyse (7)
Publicationstate
- Veröffentlichungsversion (43)
- Zweitveröffentlichung (28)
- Postprint (12)
Reviewstate
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (13)
- Wilhelm Fink (6)
- de Gruyter (6)
- IDS-Verlag (4)
- Cornell University (3)
- Erich Schmidt (3)
- Narr Francke Attempto (3)
- De Gruyter (2)
- Friedrich (2)
- LINDAT/CLARIAH-CZ (2)
Das Kommunizieren in Sozialen Medien und der Umgang mit Hypertexten ist im Jahr 2020 kein Randphänomen mehr. Die sprachlichen Besonderheiten internetbasierter Kommunikation und Sozialer Medien sind mittlerweile auch gut erforscht und beschrieben, allerdings werden diese bislang in deutschen Grammatiken, mit Ausnahme von Hoffmann (2014), allenfalls am Rande behandelt. Selbst neuere Ansätze zur Textanalyse, z. B. Ágel (2017), konzentrieren sich auf gestaltstabile, linear organisierte Schrifttexte. Dasselbe gilt für Ansätze, die primär für die Bewertung von Schreibprodukten in Bildungskontexten entwickelt wurden.
Anders als bei Sonntagspredigten haben die katholischen und evangelischen AutorInnen von Kirche in 1live nur 90 Sekunden zur Verfügung, um ihre christliche Botschaft zu vermitteln. Vorliegender Beitrag untersucht, wie die katholischen und evangelischen AutorInnen dies tun. Welche Inhalte erachten sie für relevant? Welche sprachliche Gestaltung wählen sie? Greifen katholische und evangelische AutorInnen zu den gleichen Inhalten und sprachlichen Mitteln oder zeigen sich konfessionelle Präferenzen und Differenzen? Diesen Fragen soll an einem Korpus aus Kirche in 1live-Radiopredigten aus den Jahren 2012 bis 2021 (= 2.755 Texte mit insgesamt 726.570 Token) mit einem quantitativen und qualitativen Methoden-Mix nachgegangen werden. Die Studie wird im Rahmen des DFG-Projekts „Sprache und Konfession 500 Jahre nach der Reformation“ am Germanistischen Institut der Westfälischen Wilhelms-Universität Münster durchgeführt.
Recent typological studies have shown that socio-linguistic factors have a substantial effect on at least certain structures of language. However, we are still far from understanding how such factors should be operationalized and how they interact with other factors in shaping grammar. To address both questions, this study examines the influence of socio-linguistic factors on the number of dedicated conditional constructions in a sample of 374 languages. We test the number of speakers, the degree of multilingualism, the availability of a literature tradition, the use of writing, and the use of the language in the education system. At the same time, we control for genealogical, contact, and bibliographical biases. Our results suggest that the number of speakers is the most informative predictor. However, we find that the association between the number of speakers and the number of dedicated conditional constructions is much weaker than assumed, once genealogical and contact biases are controlled for.
Einführung
(2022)
Fragen der Verdatung sind Bestandteil der digitalen Diskursanalyse und keine Vorarbeiten. Die Analyse digital(isiert)er Diskurse setzt im Unterschied zur Auswertung nicht-digital repräsentierter Sprache und Kommunikation notwendig technische Verfahren und Praktiken, Algorithmen und Software voraus, die den Untersuchungsgegenstand als digitales Datum konstituieren. Die nachfolgenden Abschnitte beschreiben kurz und knapp wiederkehrende Aspekte dieser Verdatungstechniken und -praktiken, insbesondere mit Blick auf Erhebung und Transformation (Abschnitt 2), Korpuskompilierung (Abschnitt 3), Annotation (Abschnitt 4) und Wege der analytischen Datenerschließung (Abschnitt 5). Im Fazit wird die Relevanz der Verdatungsarbeit für den Analyseprozess zusammengefasst (6).
We investigate the optional omission of the infinitival marker in a Swedish future tense construction. During the last two decades the frequency of omission has been rapidly increasing, and this process has received considerable attention in the literature. We test whether the knowledge which has been accumulated can yield accurate predictions of language variation and change. We extracted all occurrences of the construction from a very large collection of corpora. The dataset was automatically annotated with language-internal predictors which have previously been shown or hypothesized to affect the variation. We trained several models in order to make two kinds of predictions: whether the marker will be omitted in a specific utterance and how large the proportion of omissions will be for a given time period. For most of the approaches we tried, we were not able to achieve a better-than-baseline performance. The only exception was predicting the proportion of omissions using autoregressive integrated moving average models for one-step-ahead forecast, and in this case time was the only predictor that mattered. Our data suggest that most of the language-internal predictors do have some effect on the variation, but the effect is not strong enough to yield reliable predictions.
This thesis is a corpus linguistic investigation of the language used by young German speakers online, examining lexical, morphological, orthographic, and syntactic features and changes in language use over time. The study analyses the language in the Nottinghamer Korpus deutscher YouTube‐Sprache ("Nottingham corpus of German YouTube language", or NottDeuYTSch corpus), one of the first large corpora of German‐language comments taken from the videosharing website YouTube, and built specifically for this project. The metadatarich corpus comprises c.33 million tokens from more than 3 million comments posted underneath videos uploaded by mainstream German‐language youthorientated YouTube channels from 2008‐2018.
The NottDeuYTSch corpus was created to enable corpus linguistic approaches to studying digital German youth language (Jugendsprache), having identified the need for more specialised web corpora (see Barbaresi 2019). The methodology for compiling the corpus is described in detail in the thesis to facilitate future construction of web corpora. The thesis is situated at the intersection of Computer‐Mediated Communication (CMC) and youth language, which have been important areas of sociolinguistic scholarship since the 1980s, and explores what we can learn from a corpus‐driven, longitudinal approach to (online) youth language. To do so, the thesis uses corpus linguistic methods to analyse three main areas:
1. Lexical trends and the morphology of polysemous lexical items. For this purpose, the analysis focuses on geil, one of the most iconic and productive words in youth language, and presents a longitudinal analysis, demonstrating that usage of geil has decreased, and identifies lexical items that have emerged as potential replacements. Additionally, geil is used to analyse innovative morphological productiveness, demonstrating how different senses of geil are used as a base lexeme or affixoid in compounding and derivation.
2. Syntactic developments. The novel grammaticalization of several subordinating conjunctions into both coordinating conjunctions and discourse markers is examined. The investigation is supported by statistical analyses that demonstrate an increase in the use of non‐standard syntax over the timeframe of the corpus and compares the results with other corpora of written language.
3. Orthography and the metacommunicative features of digital writing. This analysis identifies orthographic features and strategies in the corpus, e.g. the repetition of certain emoji, and develops a holistic framework to study metacommunicative functions, such as the communication of illocutionary force, information structure, or the expression of identities. The framework unifies previous research that had focused on individual features, integrating a wide range of metacommunicative strategies within a single, robust system of analysis.
By using qualitative and computational analytical frameworks within corpus linguistic methods, the thesis identifies emergent linguistic features in digital youth language in German and sheds further light on lexical and morphosyntactic changes and trends in the language of young people over the period 2008‐2018. The study has also further developed and augmented existing analytical frameworks to widen the scope of their application to orthographic features associated with digital writing.
Developments within the field of Second Language Acquisition (SLA) have meant that scholars are increasingly engaging with corpora and corpus-based resources, providing a source of “‘authentic’ language” to learners and educators (Mitchell 2020: 254), and contributing to “state-of-the-art research methodologies” (Deshors and Gries 2023: 164). However, there are areas in which progress can still be made, particularly in the area of metadata, such as information about the speaker and contexts of the language use, as well as increased variety in the text types and genres of corpora used to develop SLA materials (Paquot 2022: 36). This post discusses one such possibility for increasing the variety of text types and providing a rich source of authentic language that can be used to create engaging SLA materials, particularly for young people learning German, namely the use of the NottDeuYTSch corpus (to download the corpus in a variety of formats, see Cotgrove 2018).
This paper analyses intensification in German digitally-mediated communication (DMC) using a corpus of YouTube comments written by young people (the NottDeuYTSch corpus). Research on intensification in written language has traditionally focused on two grammatical aspects: syntactic intensification, i.e. the use of particles and other lexical items and morphological intensification, i.e. the use of compounding. Using a wide variety og examples from the corpus, the paper identifies novel ways that have been used for intensification in DMC, and suggests a new taxonomy of classification for future analysis of intensification.