Korpuslinguistik
Refine
Year of publication
- 2023 (3) (remove)
Document Type
- Part of a Book (2)
- Article (1)
Language
- English (3) (remove)
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Keywords
- Korpus <Linguistik> (3)
- Deutsch (2)
- Annotation (1)
- CMC (1)
- COVID-19 (1)
- COVID-19 discourse (1)
- Computerunterstützte Kommunikation (1)
- DMC (1)
- Datensatz (1)
- German (1)
Publicationstate
- Zweitveröffentlichung (3) (remove)
Reviewstate
- (Verlags)-Lektorat (2)
- Peer-Review (1)
Publisher
- Equinox (1)
- Narr (1)
- Narr Francke Attempto (1)
In this article, we provide an insight into the development and application of a corpus-lexicographic tool for finding neologisms that are not yet listed in German dictionaries. As a starting point, we used the words listed in a glossary of German neologisms surrounding the COVID-19 pandemic. These words are lemma candidates for a new dictionary on COVID-19 discourse in German. They also provided the database used to develop and test the NeoRate tool. We report on the lexicographic work in our dictionary project, the design and functionalities of NeoRate, and describe the first test results with the tool, in particular with regard to previously unregistered words. Finally, we discuss further development of the tool and its possible applications.
This paper presents the IVK-Ler corpus, a longitudinal, annotated learner corpus of weekly writings produced by a group of 18 adolescents in a preparatory class. The corpus consists of 117 student texts collected between 2020 and 2021 and has a structure layered by student and text number. It includes metadata that enables researchers to analyze and track individual student progress in terms of syntactic competence and literacy. The annotation schema, manual and automatic annotation processes, and corpus representation are described in detail. The corpus currently includes target hypotheses and gold standard part-of-speech tags. Future work could include additional annotation layers for topological fields and dependency relations, as well as semantic and discourse annotations to make the corpus usable for tasks beyond syntactic evaluations.
This article details the process of creating the Nottinghamer Korpus deutscher YouTube-Sprache ('The Nottingham German YouTube Language Corpus' - or NottDeuYTSch corpus) and outlines potential research opportunities. The corpus was compiled to analyse the online language produced by young German-speakers and offers significant opportunity for in-depth research across several linguistic fields including lexis, morphology, syntax, orthography, and conversational and discursive analysis. The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represent an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses. The NottDeuYTSch corpus is available for analysis as part of the German Reference Corpus (DeReKo).