TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Čermáková, Anna A1 - Jantunen, Jarmo A1 - Jauhiainen, Tommi A1 - Kirk, John A1 - Křen, Michal A1 - Kupietz, Marc A1 - Uí Dhonnchadha, Elaine ED - Säily, Tanja ED - Tyrkkö, Jukka T1 - The International Comparable Corpus: Challenges in building multilingual spoken and written comparable corpora JF - Research in Corpus Linguistics: Special issue "Challenges of combining structured and unstructured data in corpus development" N2 - This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution. KW - ICC corpus KW - contrastive linguistics KW - comparable corpus KW - ICE corpus KW - data sustainability KW - Korpus KW - Mehrsprachigkeit KW - Kontrastive Linguistik KW - Gesprochene Sprache KW - Schriftsprache KW - Urheberrecht KW - copyright Y1 - 2021 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-105084 SN - 2243-4712 SS - 2243-4712 U6 - https://doi.org/10.32714/ricl.09.01.06 DO - https://doi.org/10.32714/ricl.09.01.06 VL - 9 IS - 1 SP - 89 EP - 103 PB - Spanish Association for Corpus Linguistics CY - Murcia ER -