Refine
Year of publication
- 2023 (3) (remove)
Document Type
- Article (1)
- Part of a Book (1)
- Conference Proceeding (1)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- yes (3)
Keywords
- Infrastruktur (2)
- Annotation (1)
- Data Mining (1)
- Daten (1)
- Forschung (1)
- Forschungsdaten (1)
- Geisteswissenschaften (1)
- IVK-Ler corpus of German (1)
- Interoperability (1)
- Jugendlicher (1)
Publicationstate
Reviewstate
- Peer-Review (2)
- (Verlags)-Lektorat (1)
Publisher
This paper presents the IVK-Ler corpus, a longitudinal, annotated learner corpus of weekly writings produced by a group of 18 adolescents in a preparatory class. The corpus consists of 117 student texts collected between 2020 and 2021 and has a structure layered by student and text number. It includes metadata that enables researchers to analyze and track individual student progress in terms of syntactic competence and literacy. The annotation schema, manual and automatic annotation processes, and corpus representation are described in detail. The corpus currently includes target hypotheses and gold standard part-of-speech tags. Future work could include additional annotation layers for topological fields and dependency relations, as well as semantic and discourse annotations to make the corpus usable for tasks beyond syntactic evaluations.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
Open Science and language data: Expectations vs. reality. The role of research data infrastructures
(2023)
Language data are essential for any scientific endeavor. However, unlike numerical data, language data are often protected by copyright, as they easily meet the threshold of originality. The role of research infrastructures (such CLARIN, DARIAH, and Text+) is to bridge the gap between uses allowed by statutory exceptions and the requirements of Open Science. This is achieved on the one hand by sharing language data produced by research organisations with the widest possible circle of persons, and on the other by mutualizing efforts towards copyright clearance and appropriate licensing of datasets.