The Vast and the Focused: On the need for domain-focused web corpora
- As the Web ought to be considered as a series of sources rather than as a source in itself, a problem facing corpus construction resides in meta-information and categorization. In addition, we need focused data to shed light on particular subfields of the digital public sphere. Blogs are relevant to that end, especially if the resulting web texts can be extracted along with metadata and made available in coherent and clearly describable collections.
Author: | Adrien BarbaresiGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-90257 |
DOI: | https://doi.org/10.14618/ids-pub-9025 |
Parent Title (English): | Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019 |
Publisher: | Leibniz-Institut für Deutsche Sprache |
Place of publication: | Mannheim |
Editor: | Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald Lüngen, Caroline Iliadi |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2019 |
Date of Publication (online): | 2019/07/04 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | corpus linguistics; corpus processing; web corpora |
GND Keyword: | Korpus <Linguistik> |
First Page: | 29 |
Last Page: | 32 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Conferences, Workshops: | Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019 |
Licence (German): | ![]() |