Refine
Document Type
- Conference Proceeding (3) (remove)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Korpus <Linguistik> (3)
- corpus linguistics (2)
- Datenqualität (1)
- Institut für Deutsche Sprache <Mannheim> (1)
- Korpusanalyseplattform (KorAP) (1)
- Software (1)
- Textlinguistik (1)
- comparable corpora (1)
- corpus management (1)
- corpus processing (1)
Publicationstate
- Veröffentlichungsversion (3) (remove)
Reviewstate
- Peer-Review (2)
This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal issues. Concerning interoperability, we present new ways to access DeReKo via KorAP on the API and on the plugin level. In addition we report about advancements in the EuReCo- and ICC-initiatives with the provision of comparable corpora, and about recent problems with license acquisitions and our solution approaches using an indemnification clause and model licenses that include scientific exploitation.
In this paper, we present our experiences and decisions in dealing with challenges in developing, maintaining and operating online research software tools in the field of linguistics. In particular, we highlight reproducibility, dependability, and security as important aspects of quality management – taking into account the special circumstances in which research software
is usually created.
KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP’s design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DEREKO for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.