TY  - BOOK
U1  - Buch
ED  - Bański, Piotr
ED  - Barbaresi, Adrien
ED  - Biber, Hanno
ED  - Breiteneder, Evelyn
ED  - Clematide, Simon
ED  - Kupietz, Marc
ED  - Lüngen, Harald
ED  - Iliadi, Caroline
T1  - Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22 July 2019
N2  - Contents:
1. Johannes Graën, Tannon Kew, Anastassia Shaitarova and Martin Volk, "Modelling Large Parallel Corpora", S. 1-8
2. Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary, "Asynchronous Pipelines for Processing Huge Corpora on Medium to Low Resource Infrastructures", S. 9-16
3. Vladimír Benko, "Deduplication in Large Web Corpora", S. 17-22
4. Mark Davies, "The best of both worlds: Multi-billion word “dynamic” corpora", S. 23-28
5. Adrien Barbaresi, "On the need for domain-focused web corpora", S. 29-32
6. Marc Kupietz, Eliza Margaretha, Nils Diewald, Harald Lüngen and Peter Fankhauser, "What's New in EuReCo? Interoperability, Comparable Corpora, Licensing", S. 33-39
KW  - corpus linguistics
KW  - parallel corpora
KW  - corpus management
KW  - corpus infrastructures
KW  - corpus processing
KW  - deduplication
KW  - web corpora
KW  - comparable corpora
KW  - Korpus <Linguistik>
KW  - Datenmanagement
KW  - Information Retrieval
KW  - Natürliche Sprache
Y1  - 2019
UN  - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-89986
U6  - https://doi.org/10.14618/ids-pub-8998
DO  - https://doi.org/10.14618/ids-pub-8998
SP  - 39
S1  - 39
PB  - Leibniz-Institut für Deutsche Sprache
CY  - Mannheim
ER  -