TY - CHAP U1 - Konferenzveröffentlichung A1 - Graën, Johannes A1 - Kew, Tannon A1 - Shaitarova, Anastassia A1 - Volk, Martin ED - Bański, Piotr ED - Barbaresi, Adrien ED - Biber, Hanno ED - Breiteneder, Evelyn ED - Clematide, Simon ED - Kupietz, Marc ED - Lüngen, Harald ED - Iliadi, Caroline T1 - Modelling large parallel corpora. The Zurich Parallel Corpus Collection T2 - Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019 N2 - Text corpora come in many different shapes and sizes and carry heterogeneous annotations, depending on their purpose and design. The true benefit of corpora is rooted in their annotation and the method by which this data is encoded is an important factor in their interoperability. We have accumulated a large collection of multilingual and parallel corpora and encoded it in a unified format which is compatible with a broad range of NLP tools and corpus linguistic applications. In this paper, we present our corpus collection and describe a data model and the extensions to the popular CoNLL-U format that enable us to encode it. KW - corpus linguistics KW - parallel corpora KW - corpus management KW - Korpus Y1 - 2019 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-90207 U6 - https://doi.org/10.14618/ids-pub-9020 DO - https://doi.org/10.14618/ids-pub-9020 SP - 1 EP - 8 PB - Leibniz-Institut für Deutsche Sprache CY - Mannheim ER -