TY  - CHAP
U1  - Konferenzveröffentlichung
A1  - Barbaresi, Adrien
ED  - Bański, Piotr
ED  - Barbaresi, Adrien
ED  - Biber, Hanno
ED  - Breiteneder, Evelyn
ED  - Clematide, Simon
ED  - Kupietz, Marc
ED  - Lüngen, Harald
ED  - Iliadi, Caroline
T1  - The Vast and the Focused: On the need for domain-focused web corpora
T2  - Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019
N2  - As the Web ought to be considered as a series of sources rather than as a source in itself, a problem facing corpus construction resides in meta-information and categorization. In addition, we need focused data to shed light on particular subfields of the digital public sphere. Blogs are relevant to that end, especially if the resulting web texts can be extracted along with metadata and made available in coherent and clearly describable collections.
KW  - corpus linguistics
KW  - corpus processing
KW  - web corpora
KW  - Korpus <Linguistik>
Y1  - 2019
UN  - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-90257
U6  - https://doi.org/10.14618/ids-pub-9025
DO  - https://doi.org/10.14618/ids-pub-9025
SP  - 29
EP  - 32
PB  - Leibniz-Institut für Deutsche Sprache
CY  - Mannheim
ER  -