Refine
Year of publication
- 2018 (23) (remove)
Document Type
- Conference Proceeding (23) (remove)
Has Fulltext
- yes (23)
Keywords
- Computerlinguistik (7)
- Digital Humanities (6)
- Korpus <Linguistik> (6)
- Datenmanagement (3)
- Deutsch (3)
- Forschungsdaten (3)
- Fremdsprachenlernen (3)
- Annotation (2)
- Data Mining (2)
- Datenschutz (2)
Publicationstate
- Veröffentlichungsversion (15)
- Zweitveröffentlichung (8)
- Postprint (1)
Reviewstate
- Peer-Review (21)
- (Verlags)-Lektorat (2)
Publisher
New exceptions for Text and Data Mining and their possible impact on the CLARIN infrastructure
(2018)
The proposed paper discusses new exceptions for Text and Data Mining that have recently been adopted in some EU Member States, and probably will soon be adopted also at the EU level. These exceptions are of great significance for language scientists, as they exempt those who compile corpora from the obligation to obtain authorisation from rightholders. However, corpora compiled on the basis of such exceptions cannot be freely shared, which in a long run may have serious consequences for Open Science and the functioning of research infrastructure such as CLARIN ERIC.
This abstract discusses the possibility to adopt a CLARIN Data Protection Code of Conduct pursuant art. 40 of the General Data Protection Regulation. Such a code of conduct would have important benefits for the entire language research community. The final section of this abstract proposes a roadmap to the CLARIN Data Protection Code of Conduct, listing various stages of its drafting and approval procedures.
This presentation introduces a new collaborative project: the International Comparable Corpus (ICC) (https://korpus.cz/icc), to be compiled from European national, standard(ised) languages, using the protocols for text categories and their quantities of texts in the International Corpus of English (ICE).
We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the crossdomain detection of abusive microposts.
The paper at hand discusses productivity in German compound formation – as a case of morphological variation – from a lexeme-based synchronic perspective. In particular, we focus on groups of compounds with semantically closely related head words, e.g., compounds denoting colors.
Our approach is characterized by a qualitative as well as a quantitative perspective on productivity. Taking the properties of the head lexeme as a starting point and applying corpus-based statistical methods, we try to gain new insights into compound formation, especially into potential factors which govern their productivity. In a first step, we determine the productivity of compounds on the basis of current productivity measures and data from a large corpus of German. In a second step, we try to systematically explain observable differences in productivity.
The approach presented here is one of the first attempts to apply the concept of productivity, which has been predominantly used in the domain of derivation, to compounding. Since compounding is a dominant factor for the expansion of the German lexicon, we assume that our investigation also sheds an important light on the dynamics of the lexicon.
Deutsche Geschichte-Digital: Ergebnisse der TEI-Konvertierung und Integration in Pilotprojekten
(2018)
MULLE is a tool for language learning that focuses on teaching Latin as a foreign language. It is aimed for easy integration into the traditional classroom setting and syllabus, which makes it distinct from other language learning tools that provide standalone learning experience. It uses grammar-based lessons and embraces methods of gamification to improve the learner motivation. The main type of exercise provided by our application is to practice translation, but it is also possible to shift the focus to vocabulary or morphology training.
Controlled Natural Languages (CNLs) have many applications including document authoring, automatic reasoning on texts and reliable machine translation, but their application is not limited to these areas. We explore a new application area of CNLs, the use of CNLs in computer-assisted language learning. In this paper we present a a web application for language learning using CNLs as well as a detailed description of the properties of the family of CNLs it uses.