Refine
Year of publication
Document Type
- Conference Proceeding (16)
- Part of a Book (12)
- Article (6)
- Doctoral Thesis (1)
- Report (1)
- Working Paper (1)
Has Fulltext
- yes (37)
Keywords
- Urheberrecht (12)
- Forschungsdaten (11)
- Korpus <Linguistik> (11)
- Recht (10)
- Datenschutz (6)
- Datenschutz-Grundverordnung (6)
- Personenbezogene Daten (6)
- Sprachdaten (6)
- Deutsch (4)
- Digital Humanities (4)
Publicationstate
- Veröffentlichungsversion (26)
- Zweitveröffentlichung (5)
- Postprint (4)
Reviewstate
- Peer-Review (22)
- (Verlags)-Lektorat (6)
- Peer-review (1)
Publisher
- European Language Resources Association (ELRA) (6)
- CLARIN (4)
- Linköping University Electronic Press (3)
- De Gruyter (2)
- European Language Resources Association (2)
- Routledge, Taylor & Francis Group (2)
- Springer (2)
- Technische Informationsbibliothek (2)
- Association Française pour la diffusion du RIDA (1)
- BDÜ, Weiterbildungs- und Fachverlagsgesellschaft mbh (1)
Hosting Providers play an essential role in the development of Internet services such as e-Research Infrastructures. In order to promote the development of such services, legislators on both sides of the Atlantic Ocean introduced “safe harbour” provisions to protect Service Providers (a category which includes Hosting Providers) from legal claims (e.g. of copyright infringement). Relevant provisions can be found in § 512 of the United States Copyright Act and in art. 14 of the Directive 2000/31/EC (and its national implementations). The cornerstone of this framework is the passive role of the Hosting Provider through which he has no knowledge of the content that he hosts. With the arrival of Web 2.0, however, the role of Hosting Providers on the Internet changed; this change has been reflected in court decisions that have reached varying conclusions in the last few years. The purpose of this article is to present the existing framework (including recent case law from the US, Germany and France).
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
In order to develop its full potential, global communication needs linguistic support systems such as Machine Translation (MT). In the past decade, free online MT tools have become available to the general public, and the quality of their output is increasing. However, the use of such tools may entail various legal implications, especially as far as processing of personal data is concerned. This is even more evident if we take into account that their business model is largely based on providing translation in exchange for data, which can subsequently be used to improve the translation model, but also for commercial purposes. The purpose of this paper is to examine how free online MT tools fit in the European data protection framework, harmonised by the EU Data Protection Directive. The perspectives of both the user and the MT service provider are taken into account.
Researchers in Natural Language Processing rely on availability of data and software, ideally under open licenses, but little is done to actively encourage it. In fact, the current Copyright framework grants exclusive rights to authors to copy their works, make them available to the public and make derivative works (such as annotated language corpora). Moreover, in the EU databases are protected against unauthorized extraction and re-utilization of their contents. Therefore, proper public licensing plays a crucial role in providing access to research data. A public license is a license that grants certain rights not to one particular user, but to the general public (everybody). Our article presents a tool that we developed and whose purpose is to assist the user in the licensing process. As software and data should be licensed under different licenses, the tool is composed of two separate parts: Data and Software. The underlying logic as well as elements of the graphic interface are presented below.
Une e-Université est une université qui utilise les nouvelles technologies de l'information et de la communication (NTIC) pour remplir ses missions traditionnelles : la production, la préservation et la transmission du savoir. Ses activités consistent donc à collecter et analyser les données de recherche, à diffuser les écrits scientifiques et à fournir des ressources pédagogiques numériques. Or ces biens immatériels font souvent l'objet de droits de propriété littéraire et artistique, notamment le droit d'auteur et le droit sui generis des producteurs de bases de données. Ceci oblige les e-Universités soit à obtenir des autorisations nécessaires des titulaires des monopoles, soit à avoir recours aux exceptions légales. La recherche et l'enseignement font l'objet d'exceptions légales (cf. art. L. 122-5, 3°, e) du Code de la propriété intellectuelle (CPI) et dans les art. 52a et 53 de la Urheberrechtsgesetz (UrhG)). Toutefois, celles-ci s'avèrent manifestement insuffisantes pour accommoder les activités des e-Universités. Ainsi, les législateurs nationaux ont très récemment introduit de nouvelles exceptions visant plus spécifiquement l'utilisation des NTIC dans la recherche et l'enseignement (art. L. 122-5, 10° et art. L. 342-3, 5° du CPI et les futurs art. 60a-60h de la UrhG). Une réforme en ce sens a également été proposée par la Commission Européenne (art. 3 et 4 de la proposition de la Directive sur le droit d'auteur dans le marche unique numérique). Dans ce contexte, il est souhaitable de mener le débat sur l'introduction d'une norme ouverte (de type fair use) en droit européen. Malgré cette incertitude juridique qui entoure la matière, les e-Universités n'ont pas cessé de remplir leurs missions. En effet, la communauté académique a depuis un certain temps entrepris des efforts d'autorégulation (private ordering). Le concept d'Open Science, inspiré des valeurs traditionnelles de l'éthique scientifique, a donc émergé pour promouvoir le libre partage des données de recherche (Open Research Data), des écrits scientifiques (Open Access) et des ressources pédagogiques (Open Educational Resources). Le savoir est donc perçu comme un commun (commons), dont la préservation et le développement durable sont garantis par des standards acceptés par la communauté académique. Ces standards se traduisent en langage juridique grâce aux licences publiques, telles que les Creative Commons. Ces dernières années les universités, mais aussi les organismes finançant la recherche et même les législateurs nationaux se sont activement engagés dans la promotion des communs du savoir. Ceci s'exprime à travers des "mandats" Open Access et l'instauration d'un nouveau droit de publication secondaire, d'abord en droit allemand (art. 38(4) de la UrhG) et récemment aussi en droit français (art. L. 533-4, I du Code de la recherche).
CoMParS is a resource under construction in the context of the long-term project German Grammar in European Comparison (GDE) at the IDS Mannheim. The principal goal of GDE is to create a novel contrastive grammar of German against the background of other European languages. Alongside German, which is the central focus, the core languages for comparison are English, French, Hungarian and Polish, representing different typological classes. Unlike traditional contrastive grammars available for German, which usually cover language pairs and are based on formal grammatical categories, the new GDE grammar is developed in the spirit of functionalist typology. This implies that, instead of formal criteria, cognitively motivated functional domains in terms of Givón (1984) are used as tertia comparationis. The purpose of CoMParS is to document the empirical basis of the theoretical assumptions of GDE-V and to illustrate the otherwise rather abstract content of grammar books by as many as possible naturally occurring and adequately presented multilingual examples, including information on their use in specific contexts and registers. These examples come from existing parallel corpora, and our presentation will focus on the legal aspects and consequences of this choice of language data.
The General Data Protection Regulation (hereinafter: GDPR), EU Regulation 2016/679 of 27 April 2016, will become applicable on 25 May 2018 and repeal the Personal Data Directive of 24 October 1995.
Unlike a directive, which requires transposition into national laws (while leaving the choice of “forms and methods” to the Member States), a regulation is binding and directly applicable in all Member States. This means that when the GDPR becomes applicable, all the EU countries will have the same rules regarding the protection of personal data — at least in principle, since some details (including in the area of research — see below) are expressly left to the discretion of the Member States.
The GDPR is a particularly ambitious piece of legislation (consisting of 99 articles and 173 recitals) whose intended territorial scope extends beyond the borders of the European Union. Its main concepts and principles are essentially similar to those of the Personal Data Directive, but enriched with interpretation developed through the case law of the CJEU and the opinions of the Article 29 Data Protection Working Party (hereinafter: WP29).
This White Paper will discuss the main principles of data protection and their impact on language resources, as well as special rules regarding research under the GDPR and the standardisation mechanisms recognized by the Regulation.
This paper discusses current trends in DeReKo, the German Reference Corpus, concerning legal issues around the recent German copyright reform with positive implications for corpus building and corpus linguistics in general, recent corpus extensions in the genres of popular magazines, journals, historical texts, and web-based football reports. Besides, DeReKo is finally accessible via the new
corpus research platform KorAP, offering registered users several news features in comparison with its predecessor COSMAS II.
This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).
CLARIN contractual framework for sharing language data: the perspective of personal data protection
(2020)
The article analyses the responsibility for ensuring compliance with the General Data Protection Regulation (GDPR) in research settings. As a general rule, organisations are considered the data controller (responsible party for the GDPR compliance). Research constitutes a unique setting influenced by academic freedom. This raises the question of whether academics could be considered the controller as well. However, there are some court cases and policy documents on this issue. It is not settled yet. The analysis serves a preliminary analytical background for redesigning CLARIN contractual framework for sharing data.
Privacy by Design (also referred to as Data Protection by Design) is an approach in which solutions and mechanisms addressing privacy and data protection are embedded through the entire project lifecycle, from the early design stage, rather than just added as an additional layer to the final product. Formulated in the 1990 by the Privacy Commissionner of Ontario, the principle of Privacy by Design has been discussed by institutions and policymakers on both sides of the Atlantic, and mentioned already in the 1995 EU Data Protection Directive (95/46/EC). More recently, Privacy by Design was introduced as one of the requirements of the General Data Protection Regulation (GDPR), obliging data controllers to define and adopt, already at the conception phase, appropriate measures and safeguards to implement data protection principles and protect the rights of the data subject. Failing to meet this obligation may result in a hefty fine, as it was the case in the Uniontrad decision by the French Data Protection Authority (CNIL). The ambition of the proposed paper is to analyse the practical meaning of Privacy by Design in the context of Language Resources, and propose measures and safeguards that can be implemented by the community to ensure respect of this principle.
Providing online repositories for language resources is one of the main activities of CLARIN centres. The legal framework regarding liability of Service Providers for content uploaded by their users has recently been modified by the new Directive on Copyright in the Digital Single Market. A new category of Service Providers, Online Content-Sharing Service Providers (OCSSPs), was added. It is subject to a complex and strict framework, including the requirement to obtain licenses from rightholders for the hosted content. This paper provides the background and effect of these changes to law and aims to initiate a debate on how CLARIN repositories should navigate this new legal landscape.
N-grams are of utmost importance for modern linguistics and language theory. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of newspaper publishers) also provide interesting arguments in this debate. The proposed paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.