New opportunities for researching digital youth language: The NottDeuYTSch corpus
- This article details the process of creating the Nottinghamer Korpus deutscher YouTube-Sprache ('The Nottingham German YouTube Language Corpus' - or NottDeuYTSch corpus) and outlines potential research opportunities. The corpus was compiled to analyse the online language produced by young German-speakers and offers significant opportunity for in-depth research across several linguistic fields including lexis, morphology, syntax, orthography, and conversational and discursive analysis. The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represent an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses. The NottDeuYTSch corpus is available for analysis as part of the German Reference Corpus (DeReKo).
Author: | Louis CotgroveORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-118796 |
ISBN: | 978-3-8233-9602-4 |
ISSN: | 2191-9577 |
Parent Title (German): | Neue Entwicklungen in der Korpuslandschaft der Germanistik. Beiträge zur IDS-Methodenmesse 2022 |
Series (Serial Number): | Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache | Corpus Linguistics and Interdisciplinary Perspectives on Language | CLIP (11) |
Publisher: | Narr |
Place of publication: | Tübingen |
Editor: | Marc Kupietz, Thomas Schmidt |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2023 |
Date of Publication (online): | 2023/05/31 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung] |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | (Verlags)-Lektorat |
Tag: | CMC; DMC; German; NottDeuYTSch corpus; YouTube; corpus linguistics; digital communication; youth language |
GND Keyword: | Computerunterstützte Kommunikation; Datensatz; Deutsch; Jugendsprache; Korpus <Linguistik>; Metadaten; YouTube |
First Page: | 101 |
Last Page: | 114 |
DDC classes: | 400 Sprache / 430 Deutsch |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Program areas: | L3: Lexik empirisch und digital |
Licence (German): | Urheberrechtlich geschützt |