Volltext-Downloads (blau) und Frontdoor-Views (grau)

Towards automatic quality assessment of component metadata

  • Measuring the quality of metadata is only possible by assessing the quality of the underlying schema and the metadata instance. We propose some factors that are measurable automatically for metadata according to the CMD framework, taking into account the variability of schemas that can be defined in this framework. The factors include among others the number of elements, the (re-)use of reusable components, the number of filled in elements. The resulting score can serve as an indicator of the overall quality of the CMD instance, used for feedback to metadata providers or to provide an overview of the overall quality of metadata within a repository. The score is independent of specific schemas and generalizable. An overall assessment of harvested metadata is provided in form of statistical summaries and the distribution, based on a corpus of harvested metadata. The score is implemented in XQuery and can be used in tools, editors and repositories.

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Thorsten TrippelORCiDGND, Daan BroederORCiD, Matej DurcoORCiD, Oddrun OhrenORCiD
Parent Title (English):Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). May 26-31, 2014, Reykjavik, Iceland
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Document Type:Conference Proceeding
Year of first Publication:2014
Date of Publication (online):2022/01/11
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Tag:CMDI; Component Metadata Description Infrastructure; metadata quality assessment; metadata score; quantitative quality metrics
GND Keyword:Computerlinguistik; Datenmanagement; Datenqualität; Dokumentenserver; Metadaten
First Page:3851
Last Page:3856
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (English):License LogoCreative Commons - Attribution-NonCommercial 4.0 International