Evaluating and assuring research data quality for audiovisual annotated language data
- This paper presents the QUEST project and describes concepts and tools that are being developed within its framework. The goal of the project is to establish quality criteria and curation criteria for annotated audiovisual language data. Building on existing resources developed by the participating institutions earlier, QUEST also develops tools that could be used to facilitate and verify adherence to these criteria. An important focus of the project is making these tools accessible for researchers without substantial technical background and helping them produce high-quality data. The main tools we intend to provide are a questionnaire and automatic quality assurance for depositors of language resources, both developed as web applications. They are accompanied by a knowledge base, which will contain recommendations and descriptions of best practices established in the course of the project. Conceptually, we consider three main data maturity levels in order to decide on a suitable level of strictness of the quality assurance. This division has been introduced to avoid that a set of ideal quality criteria prevent researchers from depositing or even assessing their (legacy) data. The tools described in the paper are work in progress and are expected to be released by the end of the QUEST project in 2022.
Author: | Timofey ArkhangelskiyORCiDGND, Hanna HedelandORCiD, Aleksandr Riaposov |
---|---|
URN: | urn:nbn:de:bsz:mh39-105098 |
DOI: | https://doi.org/10.3384/ecp1801 |
ISBN: | 978-91-7929-609-4 |
ISSN: | 1650-3740 |
Parent Title (English): | Selected Papers from the CLARIN Annual Conference 2020. Virtual Event, 2020, 5-7 October |
Series (Serial Number): | Linköping Electronic Conference Proceedings (180) |
Publisher: | Linköping University Electronic Press |
Place of publication: | Linköping |
Editor: | Costanza Navarretta, Maria Eskevich |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2021 |
Date of Publication (online): | 2021/07/15 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | QUEST project audiovisual data; data curation; language corpora; quality evaluation |
GND Keyword: | Audiovisuelles Material; Datenmanagement; Datenqualität; Forschungsdaten; Korpus <Linguistik> |
First Page: | 1 |
Last Page: | 7 |
Note: | A previous version of this article was published in: "Proceedings of CLARIN Annual Conference 2020. 05 – 07 October 2020, Online Edition", see http://nbn-resolving.de/urn:nbn:de:bsz:mh39-100750. |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Linguistics-Classification: | Korpuslinguistik |
Program areas: | P2: Mündliche Korpora |
Licence (English): | ![]() |