Refine
Year of publication
Document Type
- Conference Proceeding (60)
- Part of a Book (27)
- Article (11)
- Other (1)
Language
- English (99) (remove)
Keywords
- Korpus <Linguistik> (40)
- Annotation (19)
- Digital Humanities (14)
- Computerlinguistik (13)
- Sprachdaten (11)
- XML (11)
- Auszeichnungssprache (10)
- Forschungsdaten (9)
- Langzeitarchivierung (8)
- Standardisierung (6)
Publicationstate
- Veröffentlichungsversion (68)
- Postprint (11)
- Zweitveröffentlichung (9)
Reviewstate
- (Verlags)-Lektorat (52)
- Peer-Review (26)
- Verlags-Lektorat (1)
Publisher
- European Language Resources Association (ELRA) (16)
- Extreme Markup Languages Conference (6)
- Springer (6)
- European language resources association (ELRA) (4)
- Oxford University Press (4)
- de Gruyter (4)
- European Language Resources Association (3)
- University of Illinois (3)
- University of Oulu (3)
- Universität Tübingen (3)
Despite being an official language of several countries in Central and Western Europe, German is not formally recognised as the official language of the Federal Republic of Germany. However, in certain situations the use of the German language, including the spelling rules, is subject to state regulation (by acts of Federal Parliament orby administrative decisions). This article presents the content of this regulation, its scope, and the historical context in which it was adopted.
Open Science and language data: Expectations vs. reality. The role of research data infrastructures
(2023)
Language data are essential for any scientific endeavor. However, unlike numerical data, language data are often protected by copyright, as they easily meet the threshold of originality. The role of research infrastructures (such CLARIN, DARIAH, and Text+) is to bridge the gap between uses allowed by statutory exceptions and the requirements of Open Science. This is achieved on the one hand by sharing language data produced by research organisations with the widest possible circle of persons, and on the other by mutualizing efforts towards copyright clearance and appropriate licensing of datasets.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.
L’article intitulé «Traitement de l’information: Spinfo, HKI et humanités numériques - l’expérience de Cologne» présente l’histoire du développement des humanités numériques au sein de l’Université de Cologne. L'institutionnalisation des humanités numériques a commencé encore à l’époque où dans le monde germanophone le périmètre de la discipline était en train d’être défini par les travaux de quelques pionniers. Parmi eux, il convient de souligner le rôle d’Elisabeth Burr, active notamment à Tubingue, Duisbourg, Brême et Leipzig.L’article retrace le développement des humanités numériques à Cologne à partir de leurs débuts dans les années soixante du 20ème siècle, en passant par leur consolidation dans les années quatre-vingt-dix, jusqu’aux deux dernières décennies, quand Cologne est devenu un centre important de cette discipline. Le processus illustre comment une nouvelle discipline scientifique peut s’institutionnaliser au sein d’une université allemande. L’article décrit la perspective de deux domaines fondateurs: le traitement linguistique de l’information (en allemand: Sprachliche Informationsverarbeitung, Spinfo) et le traitement historico-culturel de l’information (en allemand: Historisch Kulturwissenschaftliche Informationsverarbeitung, HKI) et leur synthèse, qui a abouti en 2017 à la création de l’Institut des Humanités Numériques (Digital Humanities), qui aujourd’hui est - du point de vue interne - une composante de la Faculté de Philosophie de l’Université de Cologne et - du point de vue externe - une partie intégrante de la communauté internationale des humanités numériques.
The Leibniz-Institute for the German Language (IDS) was established in Mannheim in 1964. Since then, it has been at the forefront of innovation in German linguistics as a hub for digital language data. This chapter presents various lessons learnt from over five decades of work by the IDS, ranging from the importance of sustainability, through its strong technical base and FAIR principles, to the IDS’ role in national and international cooperation projects and its expertise on legal and ethical issues related to language resources and language technology.
Preface
(2022)
CLARIN stands for “Common Language Resources and Technology Infrastructure”. In 2012 CLARIN ERIC was established as a legal entity with the mission to create and maintain a digital infrastructure to support the sharing, use, and sustainability of language data (in written, spoken, or multimodal form) available through repositories from all over Europe, in support of research in the humanities and social sciences and beyond. Since 2016 CLARIN has had the status of Landmark research infrastructure and currently it provides easy and sustainable access to digital language data and also offers advanced tools to discover, explore, exploit, annotate, analyse, or combine such datasets, wherever they are located. This is enabled through a networked federation of centres: language data repositories, service centres, and knowledge centres with single sign-on access for all members of the academic community in all participating countries. In addition, CLARIN offers open access facilities for other interested communities of use, both inside and outside of academia. Tools and data from different centres are interoperable, so that data collections can be combined and tools from different sources can be chained to perform operations at different levels of complexity. The strategic agenda adopted by CLARIN and the activities undertaken are rooted in a strong commitment to the Open Science paradigm and the FAIR data principles. This also enables CLARIN to express its added value for the European Research Area and to act as a key driver of innovation and contributor to the increasing number of industry programmes running on data-driven processes and the digitalization of society at large.
Ethical issues in Language Resources and Language Technology are often invoked, but rarely discussed. This is at least partly because little work has been done to systematize ethical issues and principles applicable in the fields of Language Resources and Language Technology. This paper provides an overview of ethical issues that arise at different stages of Language Resources and Language Technology development, from the conception phase through the construction phase to the use phase. Based on this overview, the authors propose a tentative taxonomy of ethical issues in Language Resources and Language Technology, built around five principles: Privacy, Property, Equality, Transparency and Freedom. The authors hope that this tentative taxonomy will facilitate ethical assessment of projects in the field of Language Resources and Language Technology, and structure the discussion on ethical issues in this domain, which may eventually lead to the adoption of a universally accepted Code of Ethics of the Language Resources and Language Technology community.