Refine
Document Type
- Book (2)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- yes (2) (remove)
Keywords
- large corpora (2) (remove)
Publicationstate
Reviewstate
- Peer-Review (2)
Contents:
1. Vasile Pais, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Roxana Micu and Carol Luca Gasan: Challenges in Creating a Representative Corpus of Romanian Micro-Blogging Text. Pp. 1-7
2. Modest von Korff: Exhaustive Indexing of PubMed Records with Medical Subject Headings. Pp. 8-15
3. Luca Brigada Villa: UDeasy: a Tool for Querying Treebanks in CoNLL-U Format. Pp. 16-19
4. Nils Diewald: Matrix and Double-Array Representations for Efficient Finite State Tokenization. Pp. 20-26
5. Peter Fankhauser and Marc Kupietz: Count-Based and Predictive Language Models for Exploring DeReKo. Pp. 27-31
6. Hanno Biber: “The word expired when that world awoke.” New Challenges for Research with Large Text Corpora and Corpus-Based Discourse Studies in Totalitarian Times. Pp. 32-35
Contents:
1. Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary and Benoît Sagot: "Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus", S.1-9.
2. Markus Gärtner, Felicitas Kleinkopf, Melanie Andresen and Sibylle Hermann: "Corpus Reusability and Copyright - Challenges and Opportunities", S.10-19.
3. Nils Diewald, Eliza Margaretha and Marc Kupietz: "Lessons learned in Quality Management for Online Research Software Tools in Linguistics", S.20-26.