OPUS 4 | Search

26 search hits

1 to 10

Sort by

Year
Year
Title
Title
Author
Author

Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora (2020)

Arnold, Denis ; Fisseni, Bernhard ; Kamocki, Paweł ; Schonefeld, Oliver ; Kupietz, Marc ; Schmidt, Thomas

This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).

Legal canvas for a patchwork of multilingual quotations: the case of CoMParS (2017)

Bański, Piotr ; Kamocki, Paweł ; Trawiński, Beata

CoMParS is a resource under construction in the context of the long-term project German Grammar in European Comparison (GDE) at the IDS Mannheim. The principal goal of GDE is to create a novel contrastive grammar of German against the background of other European languages. Alongside German, which is the central focus, the core languages for comparison are English, French, Hungarian and Polish, representing different typological classes. Unlike traditional contrastive grammars available for German, which usually cover language pairs and are based on formal grammatical categories, the new GDE grammar is developed in the spirit of functionalist typology. This implies that, instead of formal criteria, cognitively motivated functional domains in terms of Givón (1984) are used as tertia comparationis. The purpose of CoMParS is to document the empirical basis of the theoretical assumptions of GDE-V and to illustrate the otherwise rather abstract content of grammar books by as many as possible naturally occurring and adequately presented multilingual examples, including information on their use in specific contexts and registers. These examples come from existing parallel corpora, and our presentation will focus on the legal aspects and consequences of this choice of language data.

Rechtliche Bedingungen für die Bereitstellung eines Chat-Korpus in CLARIN-D. Ergebnisse eines Rechtsgutachtens (2017)

Beißwenger, Michael ; Lüngen, Harald ; Schallaböck, Jan ; Weitzmann, John H. ; Herold, Axel ; Kamocki, Paweł ; Storrer, Angelika ; Wildgans, Julia

“Hello ELSA, how are you?” - Legal and ethical challenges in RDM, current and future tasks of ELSA activities against the background of AI and Anonymisation (2023)

Boehm, Franziska ; Sax, Ulrich ; Vettermann, Oliver ; Kamocki, Paweł ; Stoilova, Vasilka

The proposed contribution will shed light on current and future challenges on legal and ethical questions in research data infrastructures. The authors of the proposal will present the work of NFDI’s section on Ethical, Legal and Social Aspects (hereinafter: ELSA), whose aim is to facilitate cross-disciplinary cooperation between the NFDI consortia in the relevant areas of management and re-use of research data.

The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond (2023)

Branco, António ; Eskevich, Maria ; Frontini, Francesca ; Hajič, Jan ; Hinrichs, Erhard ; de Jong, Franciska ; Kamocki, Paweł ; König, Alexander ; Lindén, Krister ; Navarretta, Constanza ; Piasecki, Maciej ; Piperidis, Stelios ; Pitkänen, Olli ; Simov, Kiril ; Skadiņa, Inguna ; Trippel, Thorsten ; Witt, Andreas ; Zinn, Claus

CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.

When Size Matters. Legal Perspective(s) on N-grams (2020)

Kamocki, Paweł

N-grams are of utmost importance for modern linguistics and language theory. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of newspaper publishers) also provide interesting arguments in this debate. The proposed paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.

When size matters. Legal perspective(s) on N-grams (2021)

Kamocki, Paweł

N-grams are of utmost importance for modern linguistics and language technology. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of press publishers) also provide interesting arguments in this debate. The paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.

Major developments in the legal framework concerning language resources. Introductory talk for the workshop on legal and ethical issues in human language technologies, LREC 2022, Marseille, 24 June 2022 (2022)

Kamocki, Paweł

Handouts on the processing of personal data for the purposes of language research and archiving of language resources under the General Data Protection Regulation. Version 1.0, September 2021 (2021)

Kamocki, Paweł

Privacy in its many aspects is protected by various legal texts (e.g. the Basic Law, Civil Code, Criminal Code, or even the Law on Copyright in artistic and photographic works (KunstUrhG), which protects image rights). Data protection law, which governs the processing of information about individuals (personal data), also serves to protect their privacy. However, some information referring to the public sphere of an individual’s life (e.g. the fact that X is a mayor of Smallville) may still be considered personal data (see below), and as such fall within the scope of data protection rules. In this sense, data protection laws concern information that is not private. Therefore, privacy and data protection, although closely related, are distinct notions: one can violate someone else’s privacy without processing his or her personal data (e.g. simply by knocking at one’s door at night, uninvited), and vice versa: one can violate data protection rules without violating privacy. The following handouts focus exclusively on data protection rules, and specifically on the General Data Protection Regulation (GDPR). However, please keep in mind that compliance with the GDPR is not the only aspect of protecting privacy of individuals in research projects. Other rules, such as academic ethics and community standards (such as CARE) also need to be observed.

Legal issues related to the use of twitter data in language research (2021)

Kamocki, Paweł ; Hannesschläger, Vanessa ; Hoorn, Esther ; Kelli, Aleksei ; Kupietz, Marc ; Lindén, Krister ; Puksas, Andrius

Twitter data is used in a wide variety of research disciplines in Social Sciences and Humanities. Although most Twitter data is publicly available, its re-use and sharing raise many legal questions related to intellectual property and personal data protection. Moreover, the use of Twitter and its content is subject to the Terms of Service, which also regulate re-use and sharing. This extended abstract provides a brief analysis of these issues and introduces the new Academic Research product track, which enables authorized researchers to access Twitter API on a preferential basis.

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

26 search hits