430 Deutsch
Refine
Document Type
- Article (6)
- Part of a Book (2)
- Conference Proceeding (1)
- Doctoral Thesis (1)
Has Fulltext
- yes (10)
Keywords
- corpus linguistics (10) (remove)
Publicationstate
- Veröffentlichungsversion (6)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
This thesis is a corpus linguistic investigation of the language used by young German speakers online, examining lexical, morphological, orthographic, and syntactic features and changes in language use over time. The study analyses the language in the Nottinghamer Korpus deutscher YouTube‐Sprache ("Nottingham corpus of German YouTube language", or NottDeuYTSch corpus), one of the first large corpora of German‐language comments taken from the videosharing website YouTube, and built specifically for this project. The metadatarich corpus comprises c.33 million tokens from more than 3 million comments posted underneath videos uploaded by mainstream German‐language youthorientated YouTube channels from 2008‐2018.
The NottDeuYTSch corpus was created to enable corpus linguistic approaches to studying digital German youth language (Jugendsprache), having identified the need for more specialised web corpora (see Barbaresi 2019). The methodology for compiling the corpus is described in detail in the thesis to facilitate future construction of web corpora. The thesis is situated at the intersection of Computer‐Mediated Communication (CMC) and youth language, which have been important areas of sociolinguistic scholarship since the 1980s, and explores what we can learn from a corpus‐driven, longitudinal approach to (online) youth language. To do so, the thesis uses corpus linguistic methods to analyse three main areas:
1. Lexical trends and the morphology of polysemous lexical items. For this purpose, the analysis focuses on geil, one of the most iconic and productive words in youth language, and presents a longitudinal analysis, demonstrating that usage of geil has decreased, and identifies lexical items that have emerged as potential replacements. Additionally, geil is used to analyse innovative morphological productiveness, demonstrating how different senses of geil are used as a base lexeme or affixoid in compounding and derivation.
2. Syntactic developments. The novel grammaticalization of several subordinating conjunctions into both coordinating conjunctions and discourse markers is examined. The investigation is supported by statistical analyses that demonstrate an increase in the use of non‐standard syntax over the timeframe of the corpus and compares the results with other corpora of written language.
3. Orthography and the metacommunicative features of digital writing. This analysis identifies orthographic features and strategies in the corpus, e.g. the repetition of certain emoji, and develops a holistic framework to study metacommunicative functions, such as the communication of illocutionary force, information structure, or the expression of identities. The framework unifies previous research that had focused on individual features, integrating a wide range of metacommunicative strategies within a single, robust system of analysis.
By using qualitative and computational analytical frameworks within corpus linguistic methods, the thesis identifies emergent linguistic features in digital youth language in German and sheds further light on lexical and morphosyntactic changes and trends in the language of young people over the period 2008‐2018. The study has also further developed and augmented existing analytical frameworks to widen the scope of their application to orthographic features associated with digital writing.
The main aim of this contribution is to present the range of lexicographic information from LeGeDe, an electronic prototype for lexical and interactional features of spoken German. The focus lies on the detailed description of the different lexicographical information classes using illustrative examples and figures from the resource. In addition to highlighting the lexicographic microstructure and providing an overview of the outer texts and the multimedia information offer, the contribution also presents detailed background data on the conception of the LeGeDe resource. Innovative aspects and possible applications are outlined and forward-looking desiderata are offered.
Im Beitrag steht das LeGeDe-Drittmittelprojekt und der im Laufe der Projektzeit entwickelte korpusbasierte lexikografische Prototyp zu Besonderheiten des gesprochenen Deutsch in der Interaktion im Zentrum der Betrachtung. Die Entwicklung einer lexikografischen Ressource dieser Art knüpft an die vielfältigen Erfahrungen in der Erstellung von korpusbasierten Onlinewörterbüchern (insbesondere am Leibniz-Institut für Deutsche Sprache, Mannheim) und an aktuelle Methoden der korpusbasierten Lexikologie sowie der Interaktionsanalyse an und nimmt als multimedialer Prototyp für die korpusbasierte lexikografische Behandlung von gesprochensprachlichen Phänomenen eine innovative Position in der modernen Onlinelexikografie ein. Der Beitrag befasst sich im Abschnitt zur LeGeDe-Projektpräsentation ausführlich mit projektrelevanten Forschungsfragen, Projektzielen, der empirischen Datengrundlage und empirisch erhobenen Erwartungshaltungen an eine Ressource zum gesprochenen Deutsch. Die Darstellung der komplexen Struktur des LeGeDe-Prototyps wird mit zahlreichen Beispielen illustriert. In Verbindung mit der zentralen Information zur Makro- und Mikrostruktur und den lexikografischen Umtexten werden die vielfältigen Vernetzungs- und Zugriffsstrukturen aufgezeigt. Ergänzend zum abschließenden Fazit liefert der Beitrag in einem Ausblick umfangreiche Vorschläge für die zukünftige lexikografische Arbeit mit gesprochensprachlichen Korpusdaten.
Die korpusbasierte Lexikografie ist ein interessanter und vielfältiger wissenschaftlicher Anwendungsbereich, der auch im muttersprachlichen Deutschunterricht und im Deutsch-als-Fremdsprache-Unterricht eine größere Rolle einnehmen sollte. In unserem Beitrag stellen wir deshalb geeignete Korpora und Korpusanalysewerkzeuge vor, mit deren Hilfe Nutzerinnen und Nutzer einzelne Angabebereiche in einem Wörterbuch nicht nur nachvollziehen, sondern auch eigenständig erarbeiten können. Neben vorhandenen Ansätzen geschieht dies am Beispiel des Denktionarys, eines wikibasierten Wörterbuches, für das Schülerinnen und Schüler im Rahmen des Projekts Schüler machen Wörterbücher – Wörterbücher machen Schule im muttersprachlichen Deutschunterricht selbst korpusbasierte Artikel verfassten.
We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the orthographic realizations seem to be linked to the degree of expressivity.
This paper gives an insight into the basic concepts for a corpus-based lexical resource of spoken German, which is being developed by the project "The Lexicon of Spoken German"(Lexik des gesprochenen Deutsch, LeGeDe) at the "Institute for the German Language" (Institut für Deutsche Sprache, IDS) in Mannheim. The focus of the paper is on initial ideas of semi-automatic and automatic resources that assist the quantitative analysis of the corpus data for the creation of dictionary content. The work is based on the "Research and Teaching Corpus of Spoken German" (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).
Lexicographic meaning descriptions of German lexical items which are formally and semantically similar and therefore easily confused (so-called paronyms) often do not reflect their current usage of lexical items. They can even contradict one’s personal intuition or disagree with lexical usage as observed in public discourse. The reasons are manifold. Language data used for compiling dictionaries is either outdated, or lexicographic practice is rather conventional and does not take advantage of corpus-assisted approaches to semantic analysis. Despite of various modern electronic or online reference works speakers face uncertainties when dealing with easily confusable words. These are for example sensibel/sensitiv (sensitive) or kindisch/kindlich (childish/childlike). Existing dictionaries often do not provide satisfactory answers as to how to use these sets correctly. Numerous questions addressed in online forums show where uncertainties with paronyms are and why users demand further assistance concerning proper contextual usage (cf. Storjohann 2015). There are different reasons why users misuse certain items or mix up words which are similar in form and meaning. As data from written and more spontaneous language resources suggest, some confusions arise due to ongoing semantic change in the current use of some paronyms. This paper identifies shortcomings of contemporary German Dictionaries and discusses innovative ways of empirical lexicographic work that might pave the way for a new data-driven, descriptive reference work of confusable German terms. Currently, such a guide is being developed at the Institute for German Language in Mannheim implementing corpora and diverse corpus-analytical methods. Its objective is to compile a dictionary with contrastive entries which is a useful reference tool in situation of language doubt. At the same time, it aims at sensitizing users of context dependency and language change.
Corpus-assisted analyses of public discourse often focus on the level of the lexicon. This article argues in favour of corpus-assisted analyses of discourse, but also in favour of conceptualising salient lexical items in public discourse in a more determined way. It draws partly on non-Anglophone academic traditions in order to promote a conceptualisation of discourse keywords, thereby highlighting how their meaning is determined by their use in discourse contexts. It also argues in favour of emphasising the cognitive and epistemic dimensions of discourse-determined semantic structures. These points will be exemplified by means of a corpus-assisted, as well as a frame-based analysis of the discourse keyword financial crisis in British newspaper articles from 2009. Collocations of financial crisis are assigned to a generic matrix frame for ‘event’ which contains slots that specify possible statements about events. By looking at which slots are more, respectively less filled with collocates of financial crisis, we will trace semantic presence as well as absence, and thereby highlight the pragmatic dimensions of lexical semantics in public discourse. The article also advocates the suitability of discourse keyword analyses for systematic contrastive analyses of public/political discourse and for lexicographical projects that could serve to extend the insights drawn from corpus-guided approaches to discourse analysis.
Dieser Beitrag stellt das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) und die Datenbank für Gesprochenes Deutsch (DGD) als Instrumente gesprächsanalytischer Arbeit vor. Nach einer allgemeinen Einführung in FOLK und DGD im zweiten Abschnitt werden im dritten Abschnitt die methodischen Beziehungen zwischen Korpuslinguistik und Gesprächsforschung und die Herausforde-rungen, die sich bei der Begegnung dieser beiden Herangehensweisen an authenti-sches Sprachmaterial stellen, kurz skizziert. Der vierte Abschnitt illustriert dann ausgehend vom Beispiel der Formel ich sag mal, wie eine korpus- und datenbankgesteuerte Analyse zur Untersuchung von Gesprächsphänomenen beitragen kann.
Among the German negative-conditional connectors in the range of consequens markers there are the prototypical cases sonst and ansonsten. Morphological alternatives (sonsten and ansonst) are rarely mentioned in contemporary grammars and dictionaries but they actually occur with considerable frequency. The four connectors are used in two functions: as a conjunctional adverb which can occupy various positions within the sentence or as a specific kind of subordinating conjunction (Postponierer). The large IDS corpora allow us to reveal specific distributions of the lexemes and of their different ways of use. Comparing the frequencies and the distributions can indicate to which extent the phenomena are part of the standard language. The paper will report on the results and demonstrate how the findings can be deduced from the corpora. It will draw conclusions for assessing the acceptability of the variants and the extent to which they can be considered standard language additionally testing statistical instruments to visualise and calculate the variance of phenomena as association plots and DPnorm.