Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017
- Contents: 1. Andreas Dittrich: Intra-connecting a small exemplary literary corpus with semantic web technologies for exploratory literary studies, S. 1 2. John Kirk, Anna Čermáková: From ICE to ICC: The new International Comparable Corpus, S. 7 3. Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasic, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, Kevin Scannell: Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh), S. 13 4. Marc Kupietz, Andreas Witt, Piotr Bański, Dan Tufiş, Dan Cristea, Tamás Váradi: EuReCo - Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research, S. 15 5. Harald Lüngen, Marc Kupietz: CMC Corpora in DeReKo, S. 20 6. David McClure, Mark Algee-Hewitt, Douris Steele, Erik Fredner, Hannah Walser: Organizing corpora at the Stanford Literary Lab, S. 25 7. Radoslav Rábara, Pavel Rychlý ,Ondřej Herman: Accelerating corpus search using multiple cores, S. 30 8. John Vidler, Stephen Wattam: Keeping Properties with the Data: CL-MetaHeaders – An Open Specification, S. 35 9. Vladimir Benko: Are Web Corpora Inferior? The Case of Czech and Slovak, S. 43 10. Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen: Web Corpora – the best possible solution for tracking phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian, S. 49 11. Vít Suchomel: Removing Spam from Web Corpora Through Supervised Learning Using FastText, S. 56
URN: | urn:nbn:de:bsz:mh39-62434 |
---|---|
URL: | http://corpora.ids-mannheim.de/cmlc-2017.html |
Publisher: | Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick |
Document Type: | Book |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2017/07/04 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Corpus linguistics; Corpus management; Corpus technology |
GND Keyword: | Automatische Textanalyse; Datenmanagement; Korpus <Linguistik>; Texttechnologie |
Page Number: | 60 |
DDC classes: | 400 Sprache |
Open Access?: | ja |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | CMLC-5 + BigNLP / 5th Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing |
Program areas: | Grammatik |
Program areas: | Digitale Sprachwissenschaft |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung 3.0 Deutschland |