nein
Refine
Document Type
- Part of a Book (2)
- Article (1)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Korpus <Linguistik> (3) (remove)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (1)
- Peer-Review (1)
Publisher
- Oxford University Press (1)
- de Gruyter (1)
The goal of the present chapter is to explore the possibility of providing the research (but also the industrial) community that commonly uses spoken corpora with a stable portfolio of well-documented standardized formats that allow a high reuse rate of annotated spoken resources and, as a consequence, better interoperability across tools used to produce or exploit such resources.
Natural language Processing tools are mostly developed for and optimized on newspaper texts, and often Show a substantial performance drop when applied to other types of texts such as Twitter feeds, Chat data or Internet forum posts. We explore a range of easy-to-implement methods of adapting existing part-of-speech taggers to improve their performance on Internet texts. Our results show that these methods can improve tagger performance substantially.