S2: Forschungskoordination und –infrastrukturen
Refine
Document Type
- Part of a Book (26)
- Conference Proceeding (24)
- Article (7)
- Report (5)
- Book (1)
- Part of Periodical (1)
- Working Paper (1)
Has Fulltext
- yes (65)
Keywords
- Korpus <Linguistik> (28)
- Forschungsdaten (23)
- Digital Humanities (13)
- Sprachdaten (12)
- Recht (10)
- Computerlinguistik (8)
- Deutsch (8)
- Infrastruktur (8)
- CLARIN (7)
- Datenschutz (7)
Publicationstate
- Veröffentlichungsversion (46)
- Zweitveröffentlichung (16)
- Postprint (3)
- Ahead of Print (1)
Reviewstate
- Peer-Review (40)
- (Verlags)-Lektorat (18)
Repeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability. To address this, we investigate the extent to which repetition and predictability influence the articulation of the frequent German word “sie” [zi] (they). We find that articulatory variability is proportional to speaking rate and the duration of [zi], and that overall variability decreases as [zi] is repeated during the experiment. Lower variability is also observed as the conditional probability of [zi] increases, and the greatest reduction in variability occurs during the execution of the vocalic target of [i]. These results indicate that practice can produce observable differences in the articulation of even the most common gestures used in speech.
The QUEST (QUality ESTablished) project aims at ensuring the reusability of audio-visual datasets (Wamprechtshammer et al., 2022) by devising quality criteria and curating processes. RefCo (Reference Corpora) is an initiative within QUEST in collaboration with DoReCo (Documentation Reference Corpus, Paschen et al. (2020)) focusing on language documentation projects. Previously, Aznar and Seifart (2020) introduced a set of quality criteria dedicated to documenting fieldwork corpora. Based on these criteria, we establish a semi-automatic review process for existing and work-in-progress corpora, in particular for language documentation. The goal is to improve the quality of a corpus by increasing its reusability. A central part of this process is a template for machine-readable corpus documentation and automatic data verification based on this documentation. In addition to the documentation and automatic verification, the process involves a human review and potentially results in a RefCo certification of the corpus. For each of these steps, we provide guidelines and manuals. We describe the evaluation process in detail, highlight the current limits for automatic evaluation and how the manual review is organized accordingly.
Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.
The Leibniz-Institute for the German Language (IDS) was established in Mannheim in 1964. Since then, it has been at the forefront of innovation in German linguistics as a hub for digital language data. This chapter presents various lessons learnt from over five decades of work by the IDS, ranging from the importance of sustainability, through its strong technical base and FAIR principles, to the IDS’ role in national and international cooperation projects and its expertise on legal and ethical issues related to language resources and language technology.
Um eine bessere Erreichbarkeit und Zugänglichkeit zu bestehenden sowie neuen Angeboten von Lehr- und Schulungsmaterialien im Bereich der Digital Humanities zu ermöglichen, sollten diese in einem zentralen Verzeichnis zur Verfügung gestellt werden. Im Rahmen des CLARIAH-DE Projekts wurde – zunächst für die Umsetzung eines Projektmeilensteins – eine Lösung gesucht, die eine übergreifende Suche in frei zugänglichen und nachnutzbaren Lehr- und Schulungsmaterialien zu Forschungsmethoden, Verfahren sowie Werkzeugen im Bereich der Digital Humanities in unterschiedlichen Plattformen und Repositorien bietet.
The normative layer of CLARIN is, alongside the organizational and technical layers, an essential part of the infrastructure. It consists of the regulatory framework (statutory law, case law, authoritative guidelines, etc.), the contractual framework (licenses, terms of service, etc.), and ethical norms. Navigating the normative layer requires expertise, experience, and qualified effort. In order to advise the Board of Directors, a standing committee dedicated to legal and ethical issues, the CLIC, was created. Since its establishment in 2012, the CLIC has made considerable efforts to provide not only the BoD but also the general public with information and guidance. It has published many articles (both in proceedings of CLARIN conferences and in its own White Paper Series) and developed several LegalTech tools. It also runs a Legal Information Platform, where accessible information on various issues affecting language resources can be found.
This chapter will present lessons learned from CLARIN-D, the German CLARIN national consortium. Members of the CLARIN-D communities and of the CLARIN-D consortium have been engaged in innovative, data-driven, and community-based research, using language resources and tools in the humanities and neigh-bouring disciplines. We will present different use cases and users’ stories that demonstrate the innovative research potential of large digital corpora and lexical resources for the study of language change and variation, for language documentation, for literary studies, and for the social sciences. We will emphasize the added value of making language resources and tools available in the CLARIN distributed research infrastructure and will discuss legal and ethical issues that need to be addressed in the use of such an infrastructure. Innovative technical solutions for accessing digital materials still under copyright and for data mining such materials will be presented. We will outline the need for close interaction with communities of interest in the areas of curriculum development, data management, and training the next generation of digital humanities scholars. The importance of community-supported standards for encoding language resources and the practice of community-based quality control for digital research data will be presented as a crucial step toward the provisioning of high quality research data. The chapter will conclude with a discussion of impor-tant directions for innovative research and for supporting infrastructure development over the next decade and beyond.
Standards in CLARIN
(2022)
This chapter looks at a fragment of the ongoing work of the CLARIN Standards Committee (CSC) on producing a shared set of recommendations on standards, formats, and related best practices supported by the CLARIN infrastructure and its participating centres. What might at first glance seem to be a straightforward goal has over the years proven to be rather complex, reflecting the robustness and heterogeneity of the emerging distributed digital research infrastructure and the various disciplines and research traditions of the language-based humanities that it serves and represents, and therefore part of the chapter reviews the various initiatives and proposals that strove to produce helpful standards-related guidance. The focus turns next to a subtask initiated in late 2019, its scope narrowed to one of the core activities and responsibilities of CLARIN backbone centres, namely the provision of data deposition services. Centres are obligated to publish their recom-mendations concerning the repertoire of data formats that are best suited for their research profiles. We look at how this requirement has been met by the particular centres and suggest that having centres maintain their information in the Standards Information System (SIS) is the way to improve on the current state of affairs.
Preface
(2022)
Enabling appropriate access to linguistic research data, both for many researchers and for innovative research applications, is a challenging task. In this chapter, we describe how we address this challenge in the context of the German Reference Corpus DeReKo and the corpus analysis platform KorAP. The core of our approach, which is based on and tightly integrated into the CLARIN infrastructure, is to offer access at different levels. The graduated access levels make it possible to find a low-loss compromise between the possibilities opened up and the costs incurred by users and providers for each individual use case, so that, viewed over many applications, the ratio between effort and results achieved can be effectively optimized. We also report on experiences with the current state of this approach.