Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22 July 2019
- Contents: 1. Johannes Graën, Tannon Kew, Anastassia Shaitarova and Martin Volk, "Modelling Large Parallel Corpora", S. 1-8 2. Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary, "Asynchronous Pipelines for Processing Huge Corpora on Medium to Low Resource Infrastructures", S. 9-16 3. Vladimír Benko, "Deduplication in Large Web Corpora", S. 17-22 4. Mark Davies, "The best of both worlds: Multi-billion word “dynamic” corpora", S. 23-28 5. Adrien Barbaresi, "On the need for domain-focused web corpora", S. 29-32 6. Marc Kupietz, Eliza Margaretha, Nils Diewald, Harald Lüngen and Peter Fankhauser, "What's New in EuReCo? Interoperability, Comparable Corpora, Licensing", S. 33-39
| URN: | urn:nbn:de:bsz:mh39-89986 |
|---|---|
| DOI: | https://doi.org/10.14618/ids-pub-8998 |
| Publisher: | Leibniz-Institut für Deutsche Sprache |
| Place of publication: | Mannheim |
| Editor: | Piotr BańskiORCiDGND, Adrien BarbaresiGND, Hanno BiberGND, Evelyn BreitenederGND, Simon ClematideGND, Marc KupietzORCiDGND, Harald LüngenGND, Caroline Iliadi |
| Document Type: | Book |
| Language: | English |
| Year of first Publication: | 2019 |
| Date of Publication (online): | 2019/07/02 |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| Tag: | comparable corpora; corpus infrastructures; corpus linguistics; corpus management; corpus processing; deduplication; parallel corpora; web corpora |
| GND Keyword: | Datenmanagement; Information Retrieval; Korpus <Linguistik>; Natürliche Sprache |
| Page Number: | 39 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Leibniz-Classification: | Sprache, Linguistik |
| Linguistics-Classification: | Korpuslinguistik |
| Conferences, Workshops: | Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019 |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (German): | Creative Commons - CC BY - Namensnennung 4.0 International |


