Volltext-Downloads (blau) und Frontdoor-Views (grau)

Internet Corpora: A Challenge for Linguistic Processing

  • Natural language Processing tools are mostly developed for and optimized on newspaper texts, and often Show a substantial performance drop when applied to other types of texts such as Twitter feeds, Chat data or Internet forum posts. We explore a range of easy-to-implement methods of adapting existing part-of-speech taggers to improve their performance on Internet texts. Our results show that these methods can improve tagger performance substantially.

Download full text files

  • Horbach_Thater_Steffen_Fischer_Witt_Pinkal_Internet_Corpora_A_Challenge_2015.pdf
    eng

    (IDS-intern)

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Andrea Horbach, Stefan Thater, Diana Steffen, Peter M. Fischer, Andreas WittORCiDGND, Manfred Pinkal
URN:urn:nbn:de:bsz:mh39-43565
DOI:https://doi.org/10.1007/s13222-014-0172-z
ISSN:1618-2162
Parent Title (German):Datenbank-Spektrum
Document Type:Article
Language:English
Year of first Publication:2015
Date of Publication (online):2015/11/11
Reviewstate:Peer-Review
Tag:Computer-mediated communication; Natural language processing; Part-of-speech tagging
GND Keyword:Automatische Sprachanalyse; Internet; Korpus <Linguistik>; Natürliche Sprache
Volume:15
Issue:1
First Page:41
Last Page:47
Note:
Dieser Beitrag ist aus urheberrechtlichen Gründen nicht frei zugänglich.
Dewey Decimal Classification:400 Sprache / 410 Linguistik
Open Access?:Nein
Licence (German):Es gilt das UrhG