Volltext-Downloads (blau) und Frontdoor-Views (grau)

Evaluating and assuring research data quality for audiovisual annotated language data

  • This paper presents the QUEST project and describes concepts and tools that are being developed within its framework. The goal of the project is to establish quality criteria and curation criteria for annotated audiovisual language data. Building on existing resources developed by the participating institutions earlier, QUEST also develops tools that could be used to facilitate and verify adherence to these criteria. An important focus of the project is making these tools accessible for researchers without substantial technical background and helping them produce high-quality data. The main tools we intend to provide are a questionnaire and automatic quality assurance for depositors of language resources, both developed as web applications. They are accompanied by a knowledge base, which will contain recommendations and descriptions of best practices established in the course of the project. Conceptually, we consider three main data maturity levels in order to decide on a suitable level of strictness of the quality assurance. This division has been introduced to avoid that a set of ideal quality criteria prevent researchers from depositing or even assessing their (legacy) data. The tools described in the paper are work in progress and are expected to be released by the end of the QUEST project in 2022.

Export metadata

Additional Services

Search Google Scholar


Author:Timofey ArkhangelskiyORCiDGND, Hanna HedelandORCiD, Aleksandr Riaposov
Parent Title (English):Selected Papers from the CLARIN Annual Conference 2020. Virtual Event, 2020, 5-7 October
Series (Serial Number):Linköping Electronic Conference Proceedings (180)
Publisher:Linköping University Electronic Press
Place of publication:Linköping
Editor:Costanza Navarretta, Maria Eskevich
Document Type:Conference Proceeding
Year of first Publication:2021
Date of Publication (online):2021/07/15
Tag:QUEST project
audiovisual data; data curation; language corpora; quality evaluation
GND Keyword:Audiovisuelles Material; Datenmanagement; Datenqualität; Forschungsdaten; Korpus <Linguistik>
First Page:1
Last Page:7
A previous version of this article was published in: "Proceedings of CLARIN Annual Conference 2020. 05 – 07 October 2020, Online Edition", see http://nbn-resolving.de/urn:nbn:de:bsz:mh39-100750.
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Program areas:P2: Mündliche Korpora
Licence (English):License LogoCreative Commons - Attribution 4.0 International