Managing Access to Language Resources in a Corpus Analysis Platform
- Corpus query tools are crucial to CLARIN’s mission of facilitating the sharing and use of language data for research. It is a huge challenge for online corpus platforms to manage user access rights for large corpora with complex licenses and heterogeneous restrictions on access methods and purposes. This paper presents an approach to maximize user access to corpus data while protecting rights holders’ legitimate interests. Query rewriting techniques and authorization procedures allow for modeling license terms in detail, enabling broader applications. This offers an alternative to methods that only model a greatest common denominator of licenses, thereby limiting the possibilities for using the data. Our approach constitutes a flexible and extensible corpus license and user rights management component applicable for other language research environments.
| Author: | Eliza Margaretha IlligORCiDGND, Nils DiewaldORCiDGND, Paweł KamockiORCiDGND, Marc KupietzORCiDGND |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-134105 |
| URL: | https://lirias.kuleuven.be/4254504&lang=en |
| ISBN: | 978-91-8075-740-9 |
| ISSN: | 1650-3740 |
| Parent Title (English): | Proceedings of: Selected papers from the CLARIN Annual Conference 2024. Barcelona, Spain, 15–17 October 2024 (= Linköping Electronic Conference Proceedings 216). |
| Series (Serial Number): | Linköping Electronic Conference Proceedings (216) |
| Publisher: | Linköping University Electronic Press |
| Place of publication: | Linköping |
| Editor: | Thalassia KontinoORCiD, Vincent VandeghinsteORCiD |
| Document Type: | Part of a Book |
| Language: | English |
| Year of first Publication: | 2025 |
| Date of Publication (online): | 2025/08/27 |
| Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| Tag: | Authorization procedures; Clarin; Query rewriting techniques Corpus Analysis; Corpus query tools; Language data |
| First Page: | 101 |
| Last Page: | 112 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Linguistics-Classification: | Korpuslinguistik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (English): | Creative Commons - Attribution 4.0 International |


