400 Sprache, Linguistik
Refine
Document Type
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- infrastructure (3) (remove)
Publicationstate
Reviewstate
- Peer-Review (3)
This paper uses a devil’s advocate position to highlight the benefits of metadata creation for linguistic resources. It provides an overview of the required metadata infrastructure and shows that this infrastructure is in the meantime developed by various projects and hence can be deployed by those working with linguistic resources and archiving. Possible caveats of metadata creation are mentioned starting with user requirements and backgrounds, contribution to academic merits of researchers and standardisation. These are answered with existing technologies and procedures, referring to the Component Metadata Infrastructure (CMDI). CMDI provides an infrastructure and methods for adapting metadata to the requirements of specific classes of resources, using central registries for data categories, and metadata schemas. These registries allow for the definition of metadata schemas per resource type while reusing groups of data categories also used by other schemas. In summary, rules of best practice for the creation of metadata are given.
This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint metadata. Starting with an overview of the component metadata approach together with the related semantic interoperability tools and services as the ISOcat data category registry and the relation registry we explain the standardization plan and efforts for component metadata within ISO TC37/SC4. Finally, we present information about uptake and plans of the use of component metadata within the three mentioned linguistic and L&T communities.
Signposts for CLARIN
(2021)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold, Fisseni et al. (2020) present signposts as a solution to challenges in long-term preservation of corpora. Though applicable to digital resources in general, we focus on corpora, especially those that are continuously extended or subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution signposts can make to the CLARIN infrastructure, notably virtual collections, and document the design for the CMDI profile.