TY - CHAP U1 - Konferenzveröffentlichung A1 - Barbaresi, Adrien ED - Bański, Piotr ED - Barbaresi, Adrien ED - Biber, Hanno ED - Breiteneder, Evelyn ED - Clematide, Simon ED - Kupietz, Marc ED - Lüngen, Harald ED - Iliadi, Caroline T1 - The Vast and the Focused: On the need for domain-focused web corpora T2 - Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019 N2 - As the Web ought to be considered as a series of sources rather than as a source in itself, a problem facing corpus construction resides in meta-information and categorization. In addition, we need focused data to shed light on particular subfields of the digital public sphere. Blogs are relevant to that end, especially if the resulting web texts can be extracted along with metadata and made available in coherent and clearly describable collections. KW - corpus linguistics KW - corpus processing KW - web corpora KW - Korpus Y1 - 2019 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-90257 U6 - https://doi.org/10.14618/ids-pub-9025 DO - https://doi.org/10.14618/ids-pub-9025 SP - 29 EP - 32 PB - Leibniz-Institut für Deutsche Sprache CY - Mannheim ER -