Refine
Document Type
- Conference Proceeding (4) (remove)
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- yes (4)
Keywords
- Korpus <Linguistik> (4)
- corpus linguistics (2)
- Algorithmus (1)
- Datenqualität (1)
- Datenstruktur (1)
- Deutsch (1)
- Endlicher Zustandsraum (1)
- European Reference Corpus (EuReCo) (1)
- Forschungsdaten (1)
- Fremdsprachenlernen (1)
Publicationstate
- Veröffentlichungsversion (4) (remove)
Reviewstate
- Peer-Review (4) (remove)
This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal issues. Concerning interoperability, we present new ways to access DeReKo via KorAP on the API and on the plugin level. In addition we report about advancements in the EuReCo- and ICC-initiatives with the provision of comparable corpora, and about recent problems with license acquisitions and our solution approaches using an indemnification clause and model licenses that include scientific exploitation.
This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.
This paper presents an algorithm and an implementation for efficient tokenization of texts of space-delimited languages based on a deterministic finite state automaton. Two representations of the underlying data structure are presented and a model implementation for German is compared with state-of-the-art approaches. The presented solution is faster than other tools while maintaining comparable quality.
In this paper, we present our experiences and decisions in dealing with challenges in developing, maintaining and operating online research software tools in the field of linguistics. In particular, we highlight reproducibility, dependability, and security as important aspects of quality management – taking into account the special circumstances in which research software
is usually created.