TY - BOOK U1 - Buch ED - Bański, Piotr ED - Barbaresi, Adrien ED - Biber, Hanno ED - Breiteneder, Evelyn ED - Clematide, Simon ED - Kupietz, Marc ED - Lüngen, Harald ED - Iliadi, Caroline T1 - Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22 July 2019 N2 - Contents: 1. Johannes Graën, Tannon Kew, Anastassia Shaitarova and Martin Volk, "Modelling Large Parallel Corpora", S. 1-8 2. Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary, "Asynchronous Pipelines for Processing Huge Corpora on Medium to Low Resource Infrastructures", S. 9-16 3. Vladimír Benko, "Deduplication in Large Web Corpora", S. 17-22 4. Mark Davies, "The best of both worlds: Multi-billion word “dynamic” corpora", S. 23-28 5. Adrien Barbaresi, "On the need for domain-focused web corpora", S. 29-32 6. Marc Kupietz, Eliza Margaretha, Nils Diewald, Harald Lüngen and Peter Fankhauser, "What's New in EuReCo? Interoperability, Comparable Corpora, Licensing", S. 33-39 KW - corpus linguistics KW - parallel corpora KW - corpus management KW - corpus infrastructures KW - corpus processing KW - deduplication KW - web corpora KW - comparable corpora KW - Korpus KW - Datenmanagement KW - Information Retrieval KW - Natürliche Sprache Y1 - 2019 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-89986 U6 - https://doi.org/10.14618/ids-pub-8998 DO - https://doi.org/10.14618/ids-pub-8998 SP - 39 S1 - 39 PB - Leibniz-Institut für Deutsche Sprache CY - Mannheim ER -