TY - CHAP U1 - Konferenzveröffentlichung A1 - Arnold, Denis A1 - Fisseni, Bernhard A1 - Kamocki, Paweł A1 - Schonefeld, Oliver A1 - Kupietz, Marc A1 - Schmidt, Thomas ED - Bański, Piotr ED - Barbaresi, Adrien ED - Clematide, Simon ED - Kupietz, Marc ED - Lüngen, Harald ED - Pisetta, Ines T1 - Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora T2 - Proceedings of the LREC 2020 Workshop, Language Resources and Evaluation Conference, 11–16 May 2020, 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8) N2 - This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD). KW - Korpus KW - long-term archival KW - legal issues KW - metadata KW - format migration KW - Langzeitarchivierung KW - Nutzungsrecht KW - Dateiformat Y1 - 2020 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98129 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98129 UR - http://corpora.ids-mannheim.de/cmlc-2020.html SN - 979-10-95546-61-0 SB - 979-10-95546-61-0 SP - 1 EP - 9 PB - European Language Resources Association CY - Paris ER -