Refine
Year of publication
- 2018 (8) (remove)
Document Type
- Conference Proceeding (8) (remove)
Has Fulltext
- yes (8)
Keywords
- Digital Humanities (5)
- Korpus <Linguistik> (3)
- Data Mining (2)
- Datenschutz (2)
- Urheberrecht (2)
- Autor (1)
- CLARIN (1)
- Computerlinguistik (1)
- Datenverarbeitung (1)
- Deutsch (1)
Publicationstate
- Zweitveröffentlichung (8) (remove)
Reviewstate
- Peer-Review (6)
- (Verlags)-Lektorat (2)
This presentation introduces a new collaborative project: the International Comparable Corpus (ICC) (https://korpus.cz/icc), to be compiled from European national, standard(ised) languages, using the protocols for text categories and their quantities of texts in the International Corpus of English (ICE).
This abstract discusses the possibility to adopt a CLARIN Data Protection Code of Conduct pursuant art. 40 of the General Data Protection Regulation. Such a code of conduct would have important benefits for the entire language research community. The final section of this abstract proposes a roadmap to the CLARIN Data Protection Code of Conduct, listing various stages of its drafting and approval procedures.
New exceptions for Text and Data Mining and their possible impact on the CLARIN infrastructure
(2018)
The proposed paper discusses new exceptions for Text and Data Mining that have recently been adopted in some EU Member States, and probably will soon be adopted also at the EU level. These exceptions are of great significance for language scientists, as they exempt those who compile corpora from the obligation to obtain authorisation from rightholders. However, corpora compiled on the basis of such exceptions cannot be freely shared, which in a long run may have serious consequences for Open Science and the functioning of research infrastructure such as CLARIN ERIC.
Deutsche Geschichte-Digital: Ergebnisse der TEI-Konvertierung und Integration in Pilotprojekten
(2018)
The paper at hand discusses productivity in German compound formation – as a case of morphological variation – from a lexeme-based synchronic perspective. In particular, we focus on groups of compounds with semantically closely related head words, e.g., compounds denoting colors.
Our approach is characterized by a qualitative as well as a quantitative perspective on productivity. Taking the properties of the head lexeme as a starting point and applying corpus-based statistical methods, we try to gain new insights into compound formation, especially into potential factors which govern their productivity. In a first step, we determine the productivity of compounds on the basis of current productivity measures and data from a large corpus of German. In a second step, we try to systematically explain observable differences in productivity.
The approach presented here is one of the first attempts to apply the concept of productivity, which has been predominantly used in the domain of derivation, to compounding. Since compounding is a dominant factor for the expansion of the German lexicon, we assume that our investigation also sheds an important light on the dynamics of the lexicon.