Volltext-Downloads (blau) und Frontdoor-Views (grau)

Proceedings of the LREC 2020 Workshop, Language Resources and Evaluation Conference, 11–16 May 2020, 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8)

  • In order to satisfy the information needs of a wide range of researchers across a number of disciplines, large textual datasets require careful design, collection, cleaning, encoding, annotation, storage, retrieval, and curation. This daunting set of tasks has coalesced into a number of key themes and questions that are of interest to the contributing research communities: (a) what sampling techniques can we apply? (b) what quality issues should we be aware of? (c) what infrastructures and frameworks are being developed for the efficient storage, annotation, analysis and retrieval of large datasets? (d) what affordances do visualisation techniques offer for the exploratory analysis approaches of corpora? (e) what legal paths can be followed in dealing with IPR and data protection issues governing both the data sources and the query results? (f) how to guarantee that corpus data remain available and usable in a sustainable way?

Export metadata

Additional Services

Search Google Scholar


Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen, Ines Pisetta
Document Type:Book
Year of first Publication:2020
Date of Publication (online):2020/05/12
GND Keyword:Computerlinguistik; Datenmanagement; Forschungsdaten; Korpus <Linguistik>
Page Number:63
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Program areas:S1: Korpuslinguistik
Licence (English):License LogoCreative Commons - Attribution-NonCommercial 4.0 International