Linköping Electronic Conference Proceedings
Refine
Year of publication
- 2022 (2) (remove)
Document Type
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- yes (2)
Keywords
- Annotation (2)
- Korpus <Linguistik> (2)
- Archivierung (1)
- Clarin (1)
- Computerlinguistik (1)
- Datenmanagement (1)
- Datensatz (1)
- Dokumentation (1)
- FAIR (1)
- FAIR data (1)
Publicationstate
Reviewstate
- Peer-Review (2)
Publisher
190
Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.
189
This paper describes the TEI-based ISO standard 24624:2016 ‘Transcription of spoken language’ and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability between these formats and with rele- vant tools and services. The main idea behind the paper is that a digital infrastructure providing language resources and services to researchers should also allow the combined use of resources and/or services from different contexts. This requires syntactic and semantic interoperability. We propose a solution based on the ISO/TEI format and describe the necessary steps for this format to work as an exchange format with basic semantic interoperability for spoken language resources across the CLARIN infrastructure and beyond.