Refine
Document Type
- Part of a Book (3) (remove)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Korpus <Linguistik> (3)
- Deutsch (2)
- Gesprochene Sprache (2)
- Lemma (1)
- Levelled Study Corpus of Russian (LeStCoR) (1)
- Partikelverb (1)
- Russisch (1)
- Wörterbuch (1)
- corpus-based methods (1)
- list of headwords (1)
Publicationstate
- Veröffentlichungsversion (3) (remove)
Reviewstate
- Peer-Review (2)
- (Verlags)-Lektorat (1)
Except for some recent advances in spoken language lexicography (cf. Verdonik & Sepesy Maučec 2017, Hansen & Hansen 2012, Siepmann 2015), traditional lexicographic work is mainly oriented towards the written language. In this paper, we describe a method we used to identify relevant headword candidates for a lexicographic resource for spoken language that is currently being developed at the Institute for the German Language (IDS, Mannheim). We describe the challenges of the headword selection for a dictionary of spoken language, and having made considerations regarding our headword concept, we present the corpus-based procedures that we used in order to facilitate the headword selection. After presenting the results regarding the selection of one-word lemmas, we discuss the opportunities and limitations of our approach.
In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.
We present a method for detecting and reconstructing separated particle verbs in a corpus of spoken German by following an approach suggested for written language. Our study shows that the method can be applied successfully to spoken language, compares different ways of dealing with structures that are specific to spoken language corpora, analyses some remaining problems, and discusses ways of optimising precision or recall for the method. The outlook sketches some possibilities for further work in related areas.