OPUS 4 | nein

nein

Refine

Has Fulltext

yes (3)

3 search hits

1 to 3

Sort by

Data Formats for Phonological Corpora (2013)

Romary, Laurent ; Witt, Andreas

The goal of the present chapter is to explore the possibility of providing the research (but also the industrial) community that commonly uses spoken corpora with a stable portfolio of well-documented standardized formats that allow a high reuse rate of annotated spoken resources and, as a consequence, better interoperability across tools used to produce or exploit such resources.

Linguistically Annotated Corpora: Quality Assurance, Reusability and Sustainability (2008)

Zinsmeister, Heike ; Witt, Andreas ; Kübler, Sandra ; Hinrichs, Erhard

Internet Corpora: A Challenge for Linguistic Processing (2015)

Horbach, Andrea ; Thater, Stefan ; Steffen, Diana ; Fischer, Peter M. ; Witt, Andreas ; Pinkal, Manfred

Natural language Processing tools are mostly developed for and optimized on newspaper texts, and often Show a substantial performance drop when applied to other types of texts such as Twitter feeds, Chat data or Internet forum posts. We explore a range of easy-to-implement methods of adapting existing part-of-speech taggers to improve their performance on Internet texts. Our results show that these methods can improve tagger performance substantially.

1 to 3

Open Access

nein

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

3 search hits