Refine
Document Type
- Conference Proceeding (2)
- Part of a Book (1)
Language
- English (3)
Has Fulltext
- yes (3)
Keywords
- Fehleranalyse (3) (remove)
Publicationstate
- Veröffentlichungsversion (3) (remove)
Reviewstate
- Peer-Review (2)
This paper introduces LRTwiki, an improved variant of the Likelihood Ratio Test (LRT). The central idea of LRTwiki is to employ a comprehensive domain specific knowledge source as additional “on-topic” data sets, and to modify the calculation of the LRT algorithm to take advantage of this new information. The knowledge source is created on the basis of Wikipedia articles. We evaluate on the two related tasks product feature extraction and keyphrase extraction, and find LRTwiki to yield a significant improvement over the original LRT in both tasks.
This paper presents C-WEP, the Collection of Writing Errors by Professionals Writers of German. It currently consists of 245 sentences with grammatical errors. All sentences are taken from published texts. All authors are professional writers with high skill levels with respect to German, the genres, and the topics. The purpose of this collection is to provide seeds for more sophisticated writing support tools as only a very small proportion of those errors can be detected by state-of-the-art checkers. C-WEP is annotated on various levels and freely available.
We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.