S2: Forschungskoordination und –infrastrukturen
Refine
Document Type
- Conference Proceeding (34)
- Part of a Book (29)
- Article (12)
- Other (5)
- Report (5)
- Working Paper (5)
- Book (2)
- Image (1)
- Part of Periodical (1)
Has Fulltext
- yes (94)
Keywords
- Forschungsdaten (35)
- Korpus <Linguistik> (35)
- Infrastruktur (23)
- Digital Humanities (16)
- Sprachdaten (15)
- Deutsch (14)
- Computerlinguistik (11)
- Recht (11)
- CLARIN (10)
- Forschung (10)
Publicationstate
- Veröffentlichungsversion (71)
- Zweitveröffentlichung (20)
- Postprint (4)
- Ahead of Print (1)
Reviewstate
- Peer-Review (59)
- (Verlags)-Lektorat (23)
Publisher
- Zenodo (15)
- de Gruyter (13)
- European Language Resources Association (7)
- Linköping University Electronic Press (7)
- CLARIN (6)
- European Language Resources Association (ELRA) (5)
- Leibniz-Institut für Deutsche Sprache (IDS) (4)
- Association for Computational Linguistics (3)
- Leibniz-Institut für Deutsche Sprache, CLARIAH-DE (3)
- Erich Schmidt (2)
Despite being an official language of several countries in Central and Western Europe, German is not formally recognised as the official language of the Federal Republic of Germany. However, in certain situations the use of the German language, including the spelling rules, is subject to state regulation (by acts of Federal Parliament orby administrative decisions). This article presents the content of this regulation, its scope, and the historical context in which it was adopted.
Poster des Text+ Partners Leibniz-Institut für Deutsche Sprache Mannheim präsentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Das Poster wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. verfasst. NFDI wird von der Bundesrepublik Deutschland und den 16 Bundesländern finanziert, und das Konsortium Text+ wird gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 460033370. Die Autor:innen bedanken sich für die Förderung sowie Unterstützung. Ein Dank geht außerdem an alle Einrichtungen und Akteur:innen, die sich für den Verein und dessen Ziele engagieren.
This contribution summarizes the lessons learned from the organization of a joint conference on text analytics research by the Business, Economic, and Related Data (BERD@NFDI) and Text+ consortia within the National Research Data Infrastructure (NFDI) in Germany. The collaboration aimed to identify common ground and foster interdisciplinary dialogue between scholars in the humanities and in the business domain. The lessons learned include the importance of presenting research questions using textual data to establish common ground, similarities in methodology for processing textual data between the consortia, similarities in research data management, and the need for regular interconsortial discussions on textual analysis methods and data. The collaboration proved valuable for interdisciplinary dialogue within the NFDI, and further collaboration between the consortia is planned.
"Reproducibility crisis" and "empirical turn" are only two keywords when it comes to providing reasons for research data management. Research data is omnipresent and with the more and more automatic data processing procedures, they become even more important. However, just because new methods require data and produce data, this does not mean that data are easily accessible, reusable or even make a difference in the CV of a researcher, even if a large portion of research goes into data creation, acquisition, preparation, and analysis. In this talk I will present where we find data in the research process, where we may find appropriate support for data management and advocate for a procedure for including it in research publications and resumes.
This presentation relies on work within the BMBF-funded project CLARIN-D. It also builds on work within the German National Research Data Infrastructure (NFDI) consortium Text+, DFG project number 460033370.
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
The Data Governance Act was proposed in late 2020 as part of the European Strategy for Data, and adopted on 30 May 2022 (as Regulation 2022/868). It will enter into application on 24 September 2023. The Data governance Act is a major development in the legal framework affecting CLARIN and the whole language community. With its new rules on the re-use of data held by the public sector bodies and on the provision of data sharing services, and especially its encouragement of data altruism, the Data Governance Act creates new opportunities and new challenges for CLARIN ERIC. This paper analyses the provisions of the Data Governance Act, and aims at initiating the debate on how they will impact CLARIN and the whole language community.
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
The proposed contribution will shed light on current and future challenges on legal and ethical questions in research data infrastructures. The authors of the proposal will present the work of NFDI’s section on Ethical, Legal and Social Aspects (hereinafter: ELSA), whose aim is to facilitate cross-disciplinary cooperation between the NFDI consortia in the relevant areas of management and re-use of research data.
Open Science and language data: Expectations vs. reality. The role of research data infrastructures
(2023)
Language data are essential for any scientific endeavor. However, unlike numerical data, language data are often protected by copyright, as they easily meet the threshold of originality. The role of research infrastructures (such CLARIN, DARIAH, and Text+) is to bridge the gap between uses allowed by statutory exceptions and the requirements of Open Science. This is achieved on the one hand by sharing language data produced by research organisations with the widest possible circle of persons, and on the other by mutualizing efforts towards copyright clearance and appropriate licensing of datasets.
This White Paper sets out commonly agreed definitions on activities of consortia within NFDI. It aims to provide a common basis for reporting and reference regarding selected questions of cross-consortial relevance in DFG’s template for the Interim Reports. The questions were prioritised by an NFDI Task Force on Evaluation and Reporting (formerly Task Force Monitoring) as a result of discussing possible answers to the DFG template. In this process the need to agree on a generalizable meaning of terms commonly used in the context of NFDI, and reporting in particular, were identified from cross-consortial perspectives. Questions that showed the highest requirement on clarification are discussed in this White Paper. As NFDI evolves, the Task Force will likely propose further joint approaches for reporting in information infrastructures.
While each of broad relevance, the questions addressed relate to substantially different aspects of consortia’s work. They are thus also structured slightly different.
This paper presents the IVK-Ler corpus, a longitudinal, annotated learner corpus of weekly writings produced by a group of 18 adolescents in a preparatory class. The corpus consists of 117 student texts collected between 2020 and 2021 and has a structure layered by student and text number. It includes metadata that enables researchers to analyze and track individual student progress in terms of syntactic competence and literacy. The annotation schema, manual and automatic annotation processes, and corpus representation are described in detail. The corpus currently includes target hypotheses and gold standard part-of-speech tags. Future work could include additional annotation layers for topological fields and dependency relations, as well as semantic and discourse annotations to make the corpus usable for tasks beyond syntactic evaluations.
Als Teil der NFDI vernetzt Text+ ortsverteilt verschiedenste Daten und Dienste für die geisteswissenschaftliche Forschung und stellt sie der wissenschaftlichen Gemeinschaft FAIR zur Verfügung. In diesem Beitrag beschreiben wir die Umsetzung beispielhaft im Bereich der Text+ Datendomäne Sammlungen anhand von Korpora, die in verschiedenen Disziplinen Verwendung finden. Die Infrastruktur ist auf Erweiterbarkeit ausgelegt, so dass auch weitere Ressourcen über Text+ verfügbar gemacht werden können. Enthalten ist auch ein Ausblick auf weitere zu erwartende Entwicklungen. Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.
In der Bund-Länder-Vereinbarung (BLV) zu Aufbau und Förderung einer Nationalen Forschungsdateninfrastruktur (NFDI) (im Folgenden BLV-NFDI) wird in §1 festgehalten, dass mit der Förderung "eine Etablierung und Fortentwicklung eines übergreifenden Forschungsdatenmanagements" und damit eine "Steigerung der Effizienz des gesamten Wissenschaftssystems verfolgt" wird. In der BLV-NFDI werden dazu sieben Ziele vorgegeben, die eine Verfeinerung dieser Hauptziele darstellen. Dieses White Paper formuliert das gemeinsame Verständnis der beteiligten Konsortien für die sieben in der BLV-NFDI vorgegebenen Ziele. Auf der Grundlage dieses Verständnisses hat die Task Force Evaluation und Reporting Vorschläge gemacht, wie das Erreichen der Ziele erfasst, beschrieben und gemessen werden kann.
The CLARIN Concept Registry (CCR) is the common semantic ground for most CMDI-based profiles to describe language-related resources in the CLARIN universe. While the CCR supports semantic interoperability within this universe, it does not extend beyond it. The flexibility of CMDI, however, allows users to use other term or concept registries when defining their metadata components. In this paper, we describe our use of schema.org, a light ontology used by many parties across disciplines.
In dem auf die Forschungsdaten sprach- und textbasierter Disziplinen ausgerichteten NFDI-Konsortium Text+ spielen Normdaten eine zentrale Rolle für die interoperable Beschreibung und semantische Verknüpfung von verteilten Datenquellen. Insbesondere die Gemeinsame Normdatei (GND) ist ein bedeutender Hub im Zentrum eines im Entstehen begriffenen, domänenübergreifenden Wissensgraphen. Diese Funktion soll im Rahmen von Text+ durch den Aufbau einer GND-Agentur für sprach- und textbasierte Forschungsdaten weiterentwickelt und ausgebaut werden. Ziel ist es, niedrigschwellige, qualitätsgesicherte Beteiligungsmöglichkeiten für Forschende zu schaffen und zugleich den Vernetzungsgrad der GND auch durch Terminologie-Mappings zu erweitern. Spezifische Anforderungen und Nutzungspraktiken werden hierbei anhand der Datendomänen von Text+ exemplifziert.
This poster summarizes the results of the CLARIAH-DE Work Package 5 - Community Engagement: Outreach/Dissemination and Liaison.
Work package 5 engages with the community through dissemination activities, outreach and liaison. The work package set itself the following sub goals:
- Combining the existing dissemination and outreach activities of CLARIN-D and DARIAH-DE in a meaningful way and elaborating on them. In some cases this meant continuity, in other cases a new appearance for resources.
- Providing a web portal as a gateway to the CLARIAH-DE project.
- Creating a common identity and corporate identity and maintaining the established level of trust users already put into CLARIN-D and DARIAH-DE.
- Providing a social media presence as well as a physical presence at workshops, conferences and other meetings in the Digital Humanities.
Die durch die Covid-19-Pandemie bedingte Umstellung der Präsenzlehre auf digitale Lehr- und Lernformate stellte Lehrende und Studierende gleichermaßen vor eine Herausforderung. Innerhalb kürzester Zeit musste die Nutzung von Plattformen und digitalen Tools erlernt und getestet werden. Der Beitrag stellt exemplarisch Dienste und Werkzeuge von CLARIAH-DE vor und erläutert, wie die digitale Forschungsinfrastruktur Lehrende und Studierende auch im Rahmen der digitalen Lehre unterstützen kann.