Internet Corpora: A Challenge for Linguistic Processing
- Natural language Processing tools are mostly developed for and optimized on newspaper texts, and often Show a substantial performance drop when applied to other types of texts such as Twitter feeds, Chat data or Internet forum posts. We explore a range of easy-to-implement methods of adapting existing part-of-speech taggers to improve their performance on Internet texts. Our results show that these methods can improve tagger performance substantially.
Author: | Andrea Horbach, Stefan Thater, Diana Steffen, Peter M. Fischer, Andreas WittORCiDGND, Manfred Pinkal |
---|---|
URN: | urn:nbn:de:bsz:mh39-43565 |
DOI: | https://doi.org/10.1007/s13222-014-0172-z |
ISSN: | 1618-2162 |
Parent Title (German): | Datenbank-Spektrum |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2015 |
Date of Publication (online): | 2015/11/11 |
Reviewstate: | Peer-Review |
Tag: | Computer-mediated communication; Natural language processing; Part-of-speech tagging |
GND Keyword: | Automatische Sprachanalyse; Internet; Korpus <Linguistik>; Natürliche Sprache |
Volume: | 15 |
Issue: | 1 |
First Page: | 41 |
Last Page: | 47 |
Note: | Dieser Beitrag ist aus urheberrechtlichen Gründen nicht frei zugänglich. |
DDC classes: | 400 Sprache / 410 Linguistik |
Open Access?: | nein |
Licence (German): | ![]() |