Refine
Document Type
- Conference Proceeding (13)
- Part of a Book (12)
- Article (7)
Has Fulltext
- yes (32)
Keywords
- Korpus <Linguistik> (28)
- Deutsch (12)
- Kontrastive Linguistik (7)
- Deutsches Referenzkorpus (DeReKo) (5)
- Rumänisch (5)
- comparable corpora (5)
- Automatische Sprachanalyse (4)
- corpus linguistics (4)
- Forschungsdaten (3)
- Software (3)
Publicationstate
- Veröffentlichungsversion (23)
- Zweitveröffentlichung (6)
- Postprint (4)
Reviewstate
- Peer-Review (32) (remove)
Publisher
- Editura Academiei Române (3)
- European Language Resources Association (3)
- IDS-Verlag (3)
- European language resources association (ELRA) (2)
- Institut für Deutsche Sprache (2)
- Leibniz-Institut für Deutsche Sprache (2)
- Association for Computational Linguistics (1)
- CECL Papers 1 (1)
- CLARIN (1)
- European Language Resources Association (ELRA) (1)
Editorial
(2016)
CMC Corpora in DeReKo
(2017)
We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.
This presentation introduces a new collaborative project: the International Comparable Corpus (ICC) (https://korpus.cz/icc), to be compiled from European national, standard(ised) languages, using the protocols for text categories and their quantities of texts in the International Corpus of English (ICE).
This paper discusses current trends in DeReKo, the German Reference Corpus, concerning legal issues around the recent German copyright reform with positive implications for corpus building and corpus linguistics in general, recent corpus extensions in the genres of popular magazines, journals, historical texts, and web-based football reports. Besides, DeReKo is finally accessible via the new
corpus research platform KorAP, offering registered users several news features in comparison with its predecessor COSMAS II.
This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal issues. Concerning interoperability, we present new ways to access DeReKo via KorAP on the API and on the plugin level. In addition we report about advancements in the EuReCo- and ICC-initiatives with the provision of comparable corpora, and about recent problems with license acquisitions and our solution approaches using an indemnification clause and model licenses that include scientific exploitation.