Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 7 of 8739
Back to Result List

Corpus reusability and copyright - challenges and opportunities

  • Making research data publicly available for evaluation or reuse is a fundamental part of good scientific practice. However, regulations such as copyright law can prevent this practice and thereby hamper scientific progress. In Germany, text-based research disciplines have for a long time been mostly unable to publish corpora made from material outside of the public domain, effectively excluding contemporary works. While there are approaches to obfuscate text material in a way that it is no longer covered by the original copyright, many use cases still require the raw textual context for evaluation or follow-up research. Recent changes in copyright now permit text and data mining on copyrighted works. However, questions regarding reusability and sharing of such corpora at a later time are still not answered to a satisfying degree. We propose a workflow that allows interested third parties to access customized excerpts of protected corpora in accordance with current German copyright law and the soon to be implemented guidelines of the Digital Single Market directive. Our prototype is a very lightweight web interface that builds on commonly used repository software and web standards.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Markus Gärtner, Felicitas Kleinkopf, Melanie AndresenORCiDGND, Sibylle HermannORCiD
URN:urn:nbn:de:bsz:mh39-104700
DOI:https://doi.org/10.14618/ids-pub-10470
Parent Title (English):Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-9) 2021. Limerick, 12 July 2021 (Online-Event)
Publisher:Leibniz-Institut für Deutsche Sprache
Place of publication:Mannheim
Editor:Harald Lüngen, Marc Kupietz, Piotr Bański, Adrien Barbaresi, Simon Clematide, Ines Pisetta
Document Type:Conference Proceeding
Language:English
Year of first Publication:2021
Date of Publication (online):2021/06/23
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:corpus linguistics; corpus reusability
GND Keyword:Data Mining; Europäische Kommission. Digital Single Market; Forschungsdaten; Korpus <Linguistik>; Text Mining; Urheberrecht
First Page:10
Last Page:19
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Korpuslinguistik
Conferences, Workshops:Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-9) 2021. Limerick, 12 July 2021 (Online-Event)
Licence (German):License LogoCreative Commons - CC BY - Namensnennung 4.0 International