Refine
Year of publication
- 2023 (4)
Document Type
- Article (2)
- Part of a Book (1)
- Working Paper (1)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Keywords
- Korpus <Linguistik> (2)
- Linguistik (2)
- CMC (1)
- Computerunterstützte Kommunikation (1)
- DMC (1)
- Datensatz (1)
- Deutsch (1)
- Forschung (1)
- Forschungsdaten (1)
- German (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (4) (remove)
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (2)
- Narr (1)
- Zenodo (1)
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
This article details the process of creating the Nottinghamer Korpus deutscher YouTube-Sprache ('The Nottingham German YouTube Language Corpus' - or NottDeuYTSch corpus) and outlines potential research opportunities. The corpus was compiled to analyse the online language produced by young German-speakers and offers significant opportunity for in-depth research across several linguistic fields including lexis, morphology, syntax, orthography, and conversational and discursive analysis. The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represent an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses. The NottDeuYTSch corpus is available for analysis as part of the German Reference Corpus (DeReKo).