Refine
Year of publication
Document Type
- Conference Proceeding (31)
- Article (8)
- Part of a Book (5)
- Other (5)
- Working Paper (4)
Has Fulltext
- yes (53)
Keywords
- Forschungsdaten (29)
- Datenmanagement (23)
- Infrastruktur (20)
- Metadaten (20)
- Forschung (13)
- Computerlinguistik (12)
- Korpus <Linguistik> (9)
- CLARIN (8)
- Digital Humanities (6)
- Standardisierung (6)
Publicationstate
- Veröffentlichungsversion (43)
- Zweitveröffentlichung (7)
- Postprint (6)
Reviewstate
- Peer-Review (38)
- (Verlags)-Lektorat (8)
Publisher
This White Paper sets out commonly agreed definitions on activities of consortia within NFDI. It aims to provide a common basis for reporting and reference regarding selected questions of cross-consortial relevance in DFG’s template for the Interim Reports. The questions were prioritised by an NFDI Task Force on Evaluation and Reporting (formerly Task Force Monitoring) as a result of discussing possible answers to the DFG template. In this process the need to agree on a generalizable meaning of terms commonly used in the context of NFDI, and reporting in particular, were identified from cross-consortial perspectives. Questions that showed the highest requirement on clarification are discussed in this White Paper. As NFDI evolves, the Task Force will likely propose further joint approaches for reporting in information infrastructures.
While each of broad relevance, the questions addressed relate to substantially different aspects of consortia’s work. They are thus also structured slightly different.
In der Bund-Länder-Vereinbarung (BLV) zu Aufbau und Förderung einer Nationalen Forschungsdateninfrastruktur (NFDI) (im Folgenden BLV-NFDI) wird in §1 festgehalten, dass mit der Förderung "eine Etablierung und Fortentwicklung eines übergreifenden Forschungsdatenmanagements" und damit eine "Steigerung der Effizienz des gesamten Wissenschaftssystems verfolgt" wird. In der BLV-NFDI werden dazu sieben Ziele vorgegeben, die eine Verfeinerung dieser Hauptziele darstellen. Dieses White Paper formuliert das gemeinsame Verständnis der beteiligten Konsortien für die sieben in der BLV-NFDI vorgegebenen Ziele. Auf der Grundlage dieses Verständnisses hat die Task Force Evaluation und Reporting Vorschläge gemacht, wie das Erreichen der Ziele erfasst, beschrieben und gemessen werden kann.
Collaborative work in NFDI
(2023)
The non-profit association National Research Data Infrastructure (NFDI) promotes science and research through a National Research Data Infrastructure. Its aim is to develop and establish an overarching research data management (RDM) for Germany and to increase the efficiency of the entire German science system. After a two-and-a-half year build up phase, the process of adding new consortia, each representing a different data domain, has ended in March 2023. NFDI now has 26 disciplinary consortia (and one additional basic service collaboration). Now the full extent of cross-consortial interaction is beginning to show.
The CMDI Explorer
(2020)
We present the CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. The CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
CMDI Explorer
(2021)
We present CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
Signposts for CLARIN
(2020)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold et al. 2020 present Signposts as a solution to challenges in long-term preservation of corpora, especially corpora that are continuously extended and subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution Signposts can make to the CLARIN infrastructure and document the design for the CMDI profile.
Signposts for CLARIN
(2021)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold, Fisseni et al. (2020) present signposts as a solution to challenges in long-term preservation of corpora. Though applicable to digital resources in general, we focus on corpora, especially those that are continuously extended or subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution signposts can make to the CLARIN infrastructure, notably virtual collections, and document the design for the CMDI profile.
Wenn man verschiedenartige Forschungsdaten über Metadaten inhaltlich beschreiben möchte, sind bibliografische Angaben allein nicht ausreichend. Vielmehr benötigt man zusätzliche Beschreibungsmittel, die der Natur und Komplexität gegebener Forschungsressourcen Rechnung tragen. Verschiedene Arten von Forschungsdaten bedürfen verschiedener Metadatenprofile, die über gemeinsame Komponenten definiert werden. Solche Forschungsdaten können gesammelt (z.B. über OAI-PMH-Harvesting) und mittels Facetten-basierter Suche über eine einheitliche Schnittstelle exploriert werden. Der beschriebene Anwendungskontext kann über sprachwissenschaftliche Daten hinaus verallgemeinert werden.
Linguistics is facing the challenge of many other sciences as it continues to grow into increasingly complex subfields, each with its own separate or overarching branches. While linguists are certainly aware of the overall structure of the research field, they cannot follow all developments other than those of their subfields. It is thus important to help specialists but also newcomers alike to bushwhack through evolved or unknown territory of linguistic data. A considerable amount of research data in linguistics is described with metadata. While studies described and published in archived journals and conference proceedings receive a quite homogeneous set of metadata tags — e.g., author, title, publisher —, this does not hold for the empirical data and analyses that underlie such studies. Moreover, lexicons, grammars, experimental data, and other types of resources come in different forms; and to make things worse, their description in terms of metadata is also not uniform, if existing at all. These problems are well-known and there are now a number of international initiatives — e.g., CLARIN, FlareNet, MetaNet, DARIAH — to build infrastructures for managing linguistic resources. The NaLiDa project, funded by the German Research Foundation, aims at facilitating the management and access to linguistic resources originating from German research institutions. In cooperation with the German SFB 833 research center, we are developing a combination of faceted and full-text search to give integrated access through heterogeneous metadata sets. Our approach is supported by a central registry for metadata field descriptors, and a component repository for structured groups of data categories as larger building blocks.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.
This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint metadata. Starting with an overview of the component metadata approach together with the related semantic interoperability tools and services as the ISOcat data category registry and the relation registry we explain the standardization plan and efforts for component metadata within ISO TC37/SC4. Finally, we present information about uptake and plans of the use of component metadata within the three mentioned linguistic and L&T communities.
The paper’s purpose is to give an overview of the work on the Component Metadata Infrastructure (CMDI) that was implemented in the CLARIN research infrastructure. It explains, the underlying schema, the accompanying tools and services. It also describes the status and impact of the CMDI developments done within the CLARIN project and past and future collaborations with other projects.
In this contribution we present some work of the R&D European project “LIRICS” and of the ISO/TC 37/SC 4 committee related to the topic of interoperability and re-use of language resources. We introduce some basic mechanisms of the standardization work in ISO and describe in more details the general approach on how to cope with the annotation of language data within ISO.
This paper presents the system architecture as well as the underlying workflow of the Extensible Repository System of Digital Objects (ERDO) which has been developed for the sustainable archiving of language resources within the Tübingen CLARIN-D project. In contrast to other approaches focusing on archiving experts, the described workflow can be used by researchers without required knowledge in the field of long-term storage for transferring data from their local file systems into a persistent repository.
This paper describes the ongoing work to integrate WebLicht into the CLARIN infrastructure. It introduces the CLARIN infrastructure for scholars in the humanities and social sciences as well as WebLicht - an orchestration and execution environment that is built upon Service Oriented Architecture principles. The integration of WebLicht into the CLARIN infrastructure involves adapting it to the standards and practices used within CLARIN, including distributed repositories, CMDI metadata, and persistent identifiers.
Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web forms. This editor supports a number of CMDI profiles currently being provided for different types of resources. Since the editor is based on XForms and server-side processing, users can create and modify CMDI files in their standard browser without the need for further processing. Large parts of ProFormA are implemented as web services in order to reuse them in other contexts and programs.
This chapter will present lessons learned from CLARIN-D, the German CLARIN national consortium. Members of the CLARIN-D communities and of the CLARIN-D consortium have been engaged in innovative, data-driven, and community-based research, using language resources and tools in the humanities and neigh-bouring disciplines. We will present different use cases and users’ stories that demonstrate the innovative research potential of large digital corpora and lexical resources for the study of language change and variation, for language documentation, for literary studies, and for the social sciences. We will emphasize the added value of making language resources and tools available in the CLARIN distributed research infrastructure and will discuss legal and ethical issues that need to be addressed in the use of such an infrastructure. Innovative technical solutions for accessing digital materials still under copyright and for data mining such materials will be presented. We will outline the need for close interaction with communities of interest in the areas of curriculum development, data management, and training the next generation of digital humanities scholars. The importance of community-supported standards for encoding language resources and the practice of community-based quality control for digital research data will be presented as a crucial step toward the provisioning of high quality research data. The chapter will conclude with a discussion of impor-tant directions for innovative research and for supporting infrastructure development over the next decade and beyond.