Volltext-Downloads (blau) und Frontdoor-Views (grau)

Enhancing the quality of metadata by using authority control

  • The Component MetaData Infrastructure (CMDI) is the dominant framework for describing language resources according to ISO 24622 (ISO/TC 37/SC 4, 2015). Within the CLARIN world, CMDI has become a huge success. The Virtual Language Observatory (VLO) now holds over 800.000 resources, all described with CMDI-based metadata. With the metadata being harvested from about thirty centres, there is a considerable amount of heterogeneity in the data. In part, there is some use of controlled vocabularies to keep data heterogeneity in check, say when describing the type of a resource, or the country the resource is originating from. However, when CMDI data refers to the names of persons or organisations, strings are used in a rather uncontrolled manner. Here, the CMDI community can learn from libraries and archives who maintain standardised lists for all kinds of names. In this paper, we advocate the use of freely available authority files that support the unique identification of persons, organisations, and more. The systematic use of authority records enhances the quality of the metadata, hence improves the faceted browsing experience in the VLO, and also prepares the sharing of CMDI-based metadata with the data in library catalogues.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Thorsten TrippelORCiDGND, Claus ZinnORCiDGND
Parent Title (English):Proceedings of the 5th Workshop on Linked Data in Linguistics: Managing, Building and Using Linked Language Resources (LREC 2016 Workshop). 24 May 2016, Portorož, Slovenia
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:John P. McCrae, Christian Chiarcos, Elena Montiel Ponsoda, Thierry Declerck, Petya Osenova, Sebastian Hellmann
Document Type:Conference Proceeding
Year of first Publication:2016
Date of Publication (online):2022/01/07
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Tag:Component MetaData Infrastructure (CMDI); Virtual Language Observatory (VLO); authority records; bibliographic metadata; metadata quality
GND Keyword:Bibliografische Daten; Bibliothek; Bibliothekskatalog; Datenqualität; Metadaten; Normdatei; Normung
First Page:59
Last Page:62
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (English):License LogoCreative Commons - Attribution 4.0 International