Refine
Year of publication
Document Type
- Conference Proceeding (16)
- Part of a Book (12)
- Article (6)
- Doctoral Thesis (1)
- Report (1)
- Working Paper (1)
Has Fulltext
- yes (37)
Keywords
- Urheberrecht (12)
- Forschungsdaten (11)
- Korpus <Linguistik> (11)
- Recht (10)
- Datenschutz (6)
- Datenschutz-Grundverordnung (6)
- Personenbezogene Daten (6)
- Sprachdaten (6)
- Deutsch (4)
- Digital Humanities (4)
Publicationstate
- Veröffentlichungsversion (26)
- Zweitveröffentlichung (5)
- Postprint (4)
Reviewstate
- Peer-Review (22)
- (Verlags)-Lektorat (6)
- Peer-review (1)
Publisher
- European Language Resources Association (ELRA) (6)
- CLARIN (4)
- Linköping University Electronic Press (3)
- De Gruyter (2)
- European Language Resources Association (2)
- Routledge, Taylor & Francis Group (2)
- Springer (2)
- Technische Informationsbibliothek (2)
- Association Française pour la diffusion du RIDA (1)
- BDÜ, Weiterbildungs- und Fachverlagsgesellschaft mbh (1)
This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
CLARIN contractual framework for sharing language data: the perspective of personal data protection
(2020)
The article analyses the responsibility for ensuring compliance with the General Data Protection Regulation (GDPR) in research settings. As a general rule, organisations are considered the data controller (responsible party for the GDPR compliance). Research constitutes a unique setting influenced by academic freedom. This raises the question of whether academics could be considered the controller as well. However, there are some court cases and policy documents on this issue. It is not settled yet. The analysis serves a preliminary analytical background for redesigning CLARIN contractual framework for sharing data.
Digital humanities research under United States and European copyright laws. Evolving frameworks
(2021)
This chapter summarizes the current state of copyright laws in the United States and European Union that most affect Digital Humanities research, namely the fair use doctrine in the US and research exceptions in Europe, including the Directive on Copyright in the Digital Single Market, which has been finally adopted in 2019. This summary begins with a description of recent copyright advances most relevant to DH research, and finishes with an analysis of a significant remaining legal hurdle which DH researchers face: how do fair use and research exceptions deal with the critical issue of circumventing technological protection measures (TPM, a.k.a. DRM). Our discussion of the lawful means of obtaining TPM-protected material may contribute to both current DH research and planning decisions and inform future stakeholders and lawmakers of the need to allow TPM circumvention for academic research.
Une e-Université est une université qui utilise les nouvelles technologies de l'information et de la communication (NTIC) pour remplir ses missions traditionnelles : la production, la préservation et la transmission du savoir. Ses activités consistent donc à collecter et analyser les données de recherche, à diffuser les écrits scientifiques et à fournir des ressources pédagogiques numériques. Or ces biens immatériels font souvent l'objet de droits de propriété littéraire et artistique, notamment le droit d'auteur et le droit sui generis des producteurs de bases de données. Ceci oblige les e-Universités soit à obtenir des autorisations nécessaires des titulaires des monopoles, soit à avoir recours aux exceptions légales. La recherche et l'enseignement font l'objet d'exceptions légales (cf. art. L. 122-5, 3°, e) du Code de la propriété intellectuelle (CPI) et dans les art. 52a et 53 de la Urheberrechtsgesetz (UrhG)). Toutefois, celles-ci s'avèrent manifestement insuffisantes pour accommoder les activités des e-Universités. Ainsi, les législateurs nationaux ont très récemment introduit de nouvelles exceptions visant plus spécifiquement l'utilisation des NTIC dans la recherche et l'enseignement (art. L. 122-5, 10° et art. L. 342-3, 5° du CPI et les futurs art. 60a-60h de la UrhG). Une réforme en ce sens a également été proposée par la Commission Européenne (art. 3 et 4 de la proposition de la Directive sur le droit d'auteur dans le marche unique numérique). Dans ce contexte, il est souhaitable de mener le débat sur l'introduction d'une norme ouverte (de type fair use) en droit européen. Malgré cette incertitude juridique qui entoure la matière, les e-Universités n'ont pas cessé de remplir leurs missions. En effet, la communauté académique a depuis un certain temps entrepris des efforts d'autorégulation (private ordering). Le concept d'Open Science, inspiré des valeurs traditionnelles de l'éthique scientifique, a donc émergé pour promouvoir le libre partage des données de recherche (Open Research Data), des écrits scientifiques (Open Access) et des ressources pédagogiques (Open Educational Resources). Le savoir est donc perçu comme un commun (commons), dont la préservation et le développement durable sont garantis par des standards acceptés par la communauté académique. Ces standards se traduisent en langage juridique grâce aux licences publiques, telles que les Creative Commons. Ces dernières années les universités, mais aussi les organismes finançant la recherche et même les législateurs nationaux se sont activement engagés dans la promotion des communs du savoir. Ceci s'exprime à travers des "mandats" Open Access et l'instauration d'un nouveau droit de publication secondaire, d'abord en droit allemand (art. 38(4) de la UrhG) et récemment aussi en droit français (art. L. 533-4, I du Code de la recherche).
Ethical issues in Language Resources and Language Technology are often invoked, but rarely discussed. This is at least partly because little work has been done to systematize ethical issues and principles applicable in the fields of Language Resources and Language Technology. This paper provides an overview of ethical issues that arise at different stages of Language Resources and Language Technology development, from the conception phase through the construction phase to the use phase. Based on this overview, the authors propose a tentative taxonomy of ethical issues in Language Resources and Language Technology, built around five principles: Privacy, Property, Equality, Transparency and Freedom. The authors hope that this tentative taxonomy will facilitate ethical assessment of projects in the field of Language Resources and Language Technology, and structure the discussion on ethical issues in this domain, which may eventually lead to the adoption of a universally accepted Code of Ethics of the Language Resources and Language Technology community.
Privacy in its many aspects is protected by various legal texts (e.g. the Basic Law, Civil Code, Criminal Code, or even the Law on Copyright in artistic and photographic works (KunstUrhG), which protects image rights). Data protection law, which governs the processing of information about individuals (personal data), also serves to protect their privacy. However, some information referring to the public sphere of an individual’s life (e.g. the fact that X is a mayor of Smallville) may still be considered personal data (see below), and as such fall within the scope of data protection rules. In this sense, data protection laws concern information that is not private.
Therefore, privacy and data protection, although closely related, are distinct notions: one can violate someone else’s privacy without processing his or her personal data (e.g. simply by knocking at one’s door at night, uninvited), and vice versa: one can violate data protection rules without violating privacy.
The following handouts focus exclusively on data protection rules, and specifically on the General Data Protection Regulation (GDPR). However, please keep in mind that compliance with the GDPR is not the only aspect of protecting privacy of individuals in research projects. Other rules, such as academic ethics and community standards (such as CARE) also need to be observed.
The General Data Protection Regulation (hereinafter: GDPR), EU Regulation 2016/679 of 27 April 2016, will become applicable on 25 May 2018 and repeal the Personal Data Directive of 24 October 1995.
Unlike a directive, which requires transposition into national laws (while leaving the choice of “forms and methods” to the Member States), a regulation is binding and directly applicable in all Member States. This means that when the GDPR becomes applicable, all the EU countries will have the same rules regarding the protection of personal data — at least in principle, since some details (including in the area of research — see below) are expressly left to the discretion of the Member States.
The GDPR is a particularly ambitious piece of legislation (consisting of 99 articles and 173 recitals) whose intended territorial scope extends beyond the borders of the European Union. Its main concepts and principles are essentially similar to those of the Personal Data Directive, but enriched with interpretation developed through the case law of the CJEU and the opinions of the Article 29 Data Protection Working Party (hereinafter: WP29).
This White Paper will discuss the main principles of data protection and their impact on language resources, as well as special rules regarding research under the GDPR and the standardisation mechanisms recognized by the Regulation.