Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 11 of 10097
Back to Result List

The International Comparable Corpus: Challenges in building multilingual spoken and written comparable corpora

  • This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Anna ČermákováORCiDGND, Jarmo JantunenORCiD, Tommi JauhiainenORCiD, John KirkORCiD, Michal KřenORCiD, Marc KupietzORCiDGND, Elaine Uí DhonnchadhaORCiD
URN:urn:nbn:de:bsz:mh39-105084
DOI:https://doi.org/10.32714/ricl.09.01.06
ISSN:2243-4712
Parent Title (English):Research in Corpus Linguistics: Special issue "Challenges of combining structured and unstructured data in corpus development"
Publisher:Spanish Association for Corpus Linguistics
Place of publication:Murcia
Editor:Tanja Säily, Jukka Tyrkkö
Document Type:Article
Language:English
Year of first Publication:2021
Date of Publication (online):2021/07/14
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:ICC corpus; ICE corpus; comparable corpus; contrastive linguistics; copyright; data sustainability
GND Keyword:Gesprochene Sprache; Kontrastive Linguistik; Korpus <Linguistik>; Mehrsprachigkeit; Schriftsprache; Urheberrecht
Volume:9
Issue:1
First Page:89
Last Page:103
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Program areas:S1: Korpuslinguistik
Licence (English):License LogoCreative Commons - Attribution 4.0 International