Volltext-Downloads (blau) und Frontdoor-Views (grau)

EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research

  • This paper gives an overview of recent developments concerning the European Reference Corpus EuReCo, an open long-term initiative aimed at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the problems and shortcomings of other types of multilingual corpora – such as the shining-through effects in parallel corpora or the limitation to web material only in web-based comparable corpora – EuReCo constitutes a unique linguistic resource that offers new perspectives for fine-grained cross-linguistic research. The approach advocated here puts forward new solutions to notorious IPR and licensing issues, as well as to challenges of interoperability. It also addresses methodological questions concerning comparability and representativeness. While the focus of this paper is on EuReCo’s implementation-based approach to ensuring interoperability in a feasible and maintainable way, it also presents preliminary results of pilot comparative studies on light verb constructions in German, Romanian, Hungarian, Polish and Bulgarian, and reports on recent extensions and plans.

Export metadata

Additional Services

Search Google Scholar


Author:Marc KupietzORCiDGND, Piotr BańskiORCiDGND, Nils DiewaldORCiDGND, Beata TrawińskiORCiDGND, Andreas WittORCiDGND
Parent Title (English):Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024
Publisher:ELRA Language Resource Association
Place of publication:Paris
Editor:Pierre Zweigenbaum, Reinhard Rapp, Serge Sharoff
Document Type:Diploma Thesis
Year of first Publication:2024
Date of Publication (online):2024/06/04
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Tag:Comparability; Cross-Linguistic Research; Federated Corpora; Multilingual Corpora; National Corpora; Reference Corpora
GND Keyword:Korpus <Linguistik>; Mehrsprachigkeit
First Page:94
Last Page:103
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Program areas:Grammatik
Program areas:Digitale Sprachwissenschaft
Licence (German):License LogoCreative Commons - CC BY-NC - Namensnennung - Nicht kommerziell 4.0 International