Korpuslinguistik
Refine
Document Type
- Book (2) (remove)
Language
- English (2)
Has Fulltext
- yes (2) (remove)
Is part of the Bibliography
- yes (2)
Keywords
- Datenmanagement (2)
- Korpus <Linguistik> (2)
- corpus linguistics (2)
- Computerlinguistik (1)
- Forschungsdaten (1)
- Information Retrieval (1)
- Natürliche Sprache (1)
- Urheberrecht (1)
- comparable corpora (1)
- corpus infrastructures (1)
Publicationstate
Reviewstate
- Peer-Review (2)
Publisher
- Leibniz-Institut für Deutsche Sprache (2) (remove)
Contents:
1. Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary and Benoît Sagot: "Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus", S.1-9.
2. Markus Gärtner, Felicitas Kleinkopf, Melanie Andresen and Sibylle Hermann: "Corpus Reusability and Copyright - Challenges and Opportunities", S.10-19.
3. Nils Diewald, Eliza Margaretha and Marc Kupietz: "Lessons learned in Quality Management for Online Research Software Tools in Linguistics", S.20-26.
Contents:
1. Johannes Graën, Tannon Kew, Anastassia Shaitarova and Martin Volk, "Modelling Large Parallel Corpora", S. 1-8
2. Pedro Javier Ortiz Suárez, Benoît Sagot and Laurent Romary, "Asynchronous Pipelines for Processing Huge Corpora on Medium to Low Resource Infrastructures", S. 9-16
3. Vladimír Benko, "Deduplication in Large Web Corpora", S. 17-22
4. Mark Davies, "The best of both worlds: Multi-billion word “dynamic” corpora", S. 23-28
5. Adrien Barbaresi, "On the need for domain-focused web corpora", S. 29-32
6. Marc Kupietz, Eliza Margaretha, Nils Diewald, Harald Lüngen and Peter Fankhauser, "What's New in EuReCo? Interoperability, Comparable Corpora, Licensing", S. 33-39