Applying the newly extended European reference corpus EuReCo. Pilot studies of light-verb constructions in German, Romanian, Hungarian and Polish
- It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
Author: | Piotr BańskiORCiDGND, Nils DiewaldORCiDGND, Marc KupietzORCiDGND, Beata TrawińskiORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-122898 |
URL: | https://iclc10.ids-mannheim.de/ |
DOI: | https://doi.org/10.14618/f8rt-m155 |
ISBN: | 978-3-937241-96-8 |
Parent Title (English): | 10th International Contrastive Linguistics Conference (ICLC-10), 18-21 July, 2023, Mannheim, Germany |
Publisher: | IDS-Verlag |
Place of publication: | Mannheim |
Editor: | Beata Trawiński, Marc Kupietz, Kristel Proost, Jörg Zinken |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2023 |
Date of Publication (online): | 2023/11/09 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | EuReCo; collocation analysis; comparable corpora; light-verb constructions |
GND Keyword: | Kontrastive Linguistik; Korpus <Linguistik> |
First Page: | 274 |
Last Page: | 276 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Program areas: | Grammatik |
Program areas: | Digitale Sprachwissenschaft |
Licence (German): | Creative Commons - Namensnennung-Weitergabe unter gleichen Bedingungen 3.0 Deutschland |