Refine
Document Type
- Part of a Book (2)
- Conference Proceeding (2)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4) (remove)
Keywords
- Anonymisierung (4) (remove)
Publicationstate
Reviewstate
- Peer-Review (2)
- Peer-review (2)
Publisher
"Das im Januar 2022 gestartete Projekt "Sprachanfragen" (https://www.ids-mannheim.de/gra/projekte2/sprachanfragen/) verfolgt erstmalig das Ziel, Sprachanfragedaten zu erfassen, aufzubereiten und ein wissenschaftsöffentliches Monitorkorpus aus ihnen zu erstellen. Dazukommend wird eine Rechercheschnittstelle entwickelt, mit der die Sprachanfragen systematisch wissenschaftlich analysierbar gemacht werden. Das Poster gibt einen Überblick über das Projekt, zeigt erste Ergebnisse und bietet einen Ausblick auf Überlegungen zur Konzeption eines Chatbots zur automatisierten Beantwortung von Sprachanfragen." Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.
The debate on the use of personal data in language resources usually focuses — and rightfully so — on anonymisation. However, this very same debate usually ends quickly with the conclusion that proper anonymisation would necessarily cause loss of linguistically valuable information. This paper discusses an alternative approach — pseudonymisation. While pseudonymisation does not solve all the problems (inasmuch as pseudonymised data are still to be regarded as personal data and therefore their processing should still comply with the GDPR principles), it does provide a significant relief, especially — but not only — for those who process personal data for research purposes. This paper describes pseudonymisation as a measure to safeguard rights and interests of data subjects under the GDPR (with a special focus on the right to be informed). It also provides a concrete example of pseudonymisation carried out within a research project at the Institute of Information Technology and Communications of the Otto von Guericke University Magdeburg.
The paper reports on the results of a scientific colloquium dedicated to the creation of standards and best practices which are needed to facilitate the integration of language resources for CMC stemming from different origins and the linguistic analysis of CMC phenomena in different languages and genres. The key issue to be solved is that of interoperability – with respect to the structural representation of CMC genres, linguistic annotations metadata, and anonymization/pseudonymization schemas. The objective of the paper is to convince more projects to partake in a discussion about standards for CMC corpora and for the creation of a CMC corpus infrastructure across languages and genres. In view of the broad range of corpus projects which are currently underway all over Europe, there is a great window of opportunity for the creation of standards in a bottom-up approach.
As a consequence of a recent curation project, the Dortmund Chat Corpus is available in CLARIN-D research infrastructures for download and querying. In a legal expertise it had been recommended that standard measures of anonymisation be applied to the corpus before its republication. This paper reports about the anonymisation campaign that was conducted for the corpus. Anonymisation has been realised as categorisation, and the taxonomy of anonymisation categories applied is introduced and the method of applying it to the TEI files is demonstrated. The results of the anonymisation campaign as well as issues of quality assessment are discussed. Finally, pseudonymisation as an alternative to categorisation as a method of the anonymisation of CMC data is discussed, as well as possibilities of an automatisation of the process.