Keeping Properties with the Data CL-MetaHeaders - An Open Specification
- Corpus researchers, along with many other disciplines in science are being put under continual pressure to show accountability and reproducibility in their work. This is unsurprisingly difficult when the researcher is faced with a wide array of methods and tools through which to do their work; simply tracking the operations done can be problematic, especially when toolchains are often configured by the developers, but left largely as a black box to the user. Here we present a scheme for encoding this ‘meta data’ inside the corpus files themselves in a structured data format, along with a proof-of-concept tool to record the operations performed on a file.
Author: | John Vidler, Stephen Wattam |
---|---|
URN: | urn:nbn:de:bsz:mh39-62635 |
Parent Title (English): | Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017 |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2017/07/05 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Corpus linguistics; Corpus management; Corpus technology; Large corpora |
GND Keyword: | Korpus <Linguistik>; Metadaten; Methode; Texttechnologie |
Page Number: | 7 |
First Page: | 35 |
Last Page: | 41 |
DDC classes: | 400 Sprache |
Open Access?: | ja |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-5 + BigNLP / 5th Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |