TY - CHAP U1 - Konferenzveröffentlichung A1 - Lange, Herbert A1 - Aznar, Jocelyn ED - Calzolari, Nicoletta ED - Béchet, Frédéric ED - Blache, Philippe ED - Choukri, Khalid ED - Cieri, Christopher ED - Declerck, Thierry ED - Goggi, Sara ED - Isahara, Hitoshi ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Hélène ED - Odijk, Jan ED - Piperidis, Stelios T1 - RefCo and its checker: improving language documentation corpora’s reusability through a semi-automatic review process T2 - Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Marseille, 20-25 June 2022 N2 - The QUEST (QUality ESTablished) project aims at ensuring the reusability of audio-visual datasets (Wamprechtshammer et al., 2022) by devising quality criteria and curating processes. RefCo (Reference Corpora) is an initiative within QUEST in collaboration with DoReCo (Documentation Reference Corpus, Paschen et al. (2020)) focusing on language documentation projects. Previously, Aznar and Seifart (2020) introduced a set of quality criteria dedicated to documenting fieldwork corpora. Based on these criteria, we establish a semi-automatic review process for existing and work-in-progress corpora, in particular for language documentation. The goal is to improve the quality of a corpus by increasing its reusability. A central part of this process is a template for machine-readable corpus documentation and automatic data verification based on this documentation. In addition to the documentation and automatic verification, the process involves a human review and potentially results in a RefCo certification of the corpus. For each of these steps, we provide guidelines and manuals. We describe the evaluation process in detail, highlight the current limits for automatic evaluation and how the manual review is organized accordingly. KW - QUEST KW - reusability KW - quality checking KW - language resources KW - oral language KW - Korpus KW - Dokumentation KW - Datensatz KW - Zertifizierung KW - Richtlinie KW - Sprachdaten KW - Gesprochene Sprache KW - Annotation KW - Computerlinguistik KW - annotated corpora KW - language documentation Y1 - 2022 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-114592 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-114592 UR - https://aclanthology.org/2022.lrec-1.291 SN - 979-10-95546-72-6 SB - 979-10-95546-72-6 SP - 2721 EP - 2729 PB - European Language Resources Association (ELRA) CY - Paris ER -