Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22 July 2019

Contents: 1. Johannes Graën, Tannon Kew, Anastassia Shaitarova and Martin Volk, "Modelling Large Parallel Corpora", S. 1-8 2. Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary, "Asynchronous Pipelines for Processing Huge Corpora on Medium to Low Resource Infrastructures", S. 9-16 3. Vladimír Benko, "Deduplication in Large Web Corpora", S. 17-22 4. Mark Davies, "The best of both worlds: Multi-billion word “dynamic” corpora", S. 23-28 5. Adrien Barbaresi, "On the need for domain-focused web corpora", S. 29-32 6. Marc Kupietz, Eliza Margaretha, Nils Diewald, Harald Lüngen and Peter Fankhauser, "What's New in EuReCo? Interoperability, Comparable Corpora, Licensing", S. 33-39

Metadaten
URN:	urn:nbn:de:bsz:mh39-89986
DOI:	https://doi.org/10.14618/ids-pub-8998
Publisher:	Leibniz-Institut für Deutsche Sprache
Place of publication:	Mannheim
Editor:	Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald Lüngen, Caroline Iliadi
Document Type:	Book
Language:	English
Year of first Publication:	2019
Date of Publication (online):	2019/07/02
Publicationstate:	Veröffentlichungsversion
Reviewstate:	Peer-Review
Tag:	comparable corpora; corpus infrastructures; corpus linguistics; corpus management; corpus processing; deduplication; parallel corpora; web corpora
GND Keyword:	Datenmanagement; Information Retrieval; Korpus <Linguistik>; Natürliche Sprache
Page Number:	39
DDC classes:	400 Sprache / 400 Sprache, Linguistik
Open Access?:	ja
Leibniz-Classification:	Sprache, Linguistik
Linguistics-Classification:	Korpuslinguistik
Conferences, Workshops:	Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019
Program areas:	Digitale Sprachwissenschaft
Licence (German):	Creative Commons - CC BY - Namensnennung 4.0 International

Open Access