Refine
Document Type
- Conference Proceeding (6)
- Article (3)
- Part of a Book (3)
- Working Paper (2)
Has Fulltext
- yes (14)
Keywords
- Digital Humanities (7)
- Urheberrecht (7)
- Korpus <Linguistik> (6)
- CLARIN (3)
- Datenschutz (3)
- Creative Commons (2)
- Data Mining (2)
- Europäische Union (2)
- Forschungsdaten (2)
- Nutzungsrecht (2)
Publicationstate
Reviewstate
- Peer-Review (6)
- (Verlags)-Lektorat (3)
Publisher
- CLARIN Legal and Ethical Issues Committee (CLIC) (2)
- Clarin (2)
- Universität zu Köln (2)
- Association Française pour la diffusion du RIDA (1)
- Association for Computational Linguistics (1)
- European Language Resources Association (ELRA) (1)
- Institut für Deutsche Sprache (1)
- Linköping University Electronic Press (1)
- Nisaba (1)
- Routledge, Taylor & Francis Group (1)
The possibilities of re-use and archiving of spoken and written corpora are affected by personality rights (depending on legal tradition also called: the right of publicity), copyright law and data protection / privacy laws. These recommendations include information about legal aspects which should be considered while creating corpora to ensure the greatest archivability and re-usability possible in compliance with current laws.
The information compiled here shall serve researchers who plan to create corpora or who are involved in evaluation of such measures as a guideline. This information is not exhaustive or to be considered as legal advice. Researchers should consult institutional legal departments and management before making legally relevant decisions. That said, further legal expertise should be sought if possible as early as project planning phases.
Twenty-two historical encyclopedias encoded in TEI: a new resource for the Digital Humanities
(2020)
This paper accompanies the corpus publication of EncycNet, a novel XML/TEI annotated corpus of 22 historical German encyclopedias from the early 18th to early 20th century. We describe the creation and annotation of the corpus, including the rationale for its development, suggested methodology for TEI annotation, possible use cases and future work. While many well-developed annotation standards for lexical resources exist, none can adequately model the encyclopedias at hand, and we therefore suggest how the TEI Lex-0 standard may be modified with additional guidelines for the annotation of historical encyclopedias. As the digitization and annotation of historical encyclopedias are settling on TEI as the de facto standard, our methodology may inform similar projects.
Providing online repositories for language resources is one of the main activities of CLARIN centres. The legal framework regarding liability of Service Providers for content uploaded by their users has recently been modified by the new Directive on Copyright in the Digital Single Market. A new category of Service Providers, Online Content-Sharing Service Providers (OCSSPs), was added. It is subject to a complex and strict framework, including the requirement to obtain licenses from rightholders for the hosted content. This paper provides the background and effect of these changes to law and aims to initiate a debate on how CLARIN repositories should navigate this new legal landscape.
New exceptions for Text and Data Mining and their possible impact on the CLARIN infrastructure
(2018)
The proposed paper discusses new exceptions for Text and Data Mining that have recently been adopted in some EU Member States, and probably will soon be adopted also at the EU level. These exceptions are of great significance for language scientists, as they exempt those who compile corpora from the obligation to obtain authorisation from rightholders. However, corpora compiled on the basis of such exceptions cannot be freely shared, which in a long run may have serious consequences for Open Science and the functioning of research infrastructure such as CLARIN ERIC.
This abstract discusses the possibility to adopt a CLARIN Data Protection Code of Conduct pursuant art. 40 of the General Data Protection Regulation. Such a code of conduct would have important benefits for the entire language research community. The final section of this abstract proposes a roadmap to the CLARIN Data Protection Code of Conduct, listing various stages of its drafting and approval procedures.
The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.