LRTwiki: enriching the likelihood ratio test with encyclopedic information for the extraction of relevant terms
- This paper introduces LRTwiki, an improved variant of the Likelihood Ratio Test (LRT). The central idea of LRTwiki is to employ a comprehensive domain specific knowledge source as additional “on-topic” data sets, and to modify the calculation of the LRT algorithm to take advantage of this new information. The knowledge source is created on the basis of Wikipedia articles. We evaluate on the two related tasks product feature extraction and keyphrase extraction, and find LRTwiki to yield a significant improvement over the original LRT in both tasks.
Author: | Niklas JakobGND, Mark-Christoph MüllerORCiDGND, Iryna GurevychORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-110906 |
URL: | http://tubiblio.ulb.tu-darmstadt.de/104742/ |
Parent Title (English): | Proceedings of the WikiAI 09 - IJCAI Workshop: User Contributed Knowledge and Artificial Intelligence: An Evolving Synergy |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2009 |
Date of Publication (online): | 2022/06/17 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | LRTwiki; Wikipedia articles; keyphrase extraction; likelihood ratio test; product feature extraction; semantic information management |
GND Keyword: | Algorithmus; Datensatz; Enzyklopädie; Fehleranalyse; Information Extraction; Likelihood-Quotienten-Test; Wikipedia |
First Page: | 3 |
Last Page: | 8 |
DDC classes: | 000 Allgemeines, Informatik, Informationswissenschaft / 020 Bibliotheks- und Informationswissenschaft |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | ![]() |