Volltext-Downloads (blau) und Frontdoor-Views (grau)

MaJo - A Toolkit for Supervised Word Sense Disambiguation and Active Learning

  • We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user through the learning process. MaJo integrates off-the-shelf NLP tools like POS taggers, treebank-trained statistical parsers, as well as linguistic resources like WordNet and GermaNet. It enables the user to systematically explore the benefit gained from different feature types for WSD. In addition, MaJo provides an Active Learning environment, where the system presents carefully selected instances to a human oracle. The toolkit supports manual annotation of the selected instances and re-trains the system on the extended data set. MaJo also provides the means to evaluate the performance of the system against a gold standard. We illustrate the usefulness of our system by learning the frames (word senses) for three verbs from the SALSA corpus, a version of the TiGer treebank with an additional layer of frame-semantic annotation. We show how MaJo can be used to tune the feature set for specific target words and so improve performance for these targets. We also show that syntactic features, when carefully tuned to the target word, can lead to a substantial increase in performance.
Metadaten
Author:Ines Rehbein, Josef RuppenhoferGND, Jonas Sunde
URN:urn:nbn:de:bsz:mh39-53093
ISBN:978-88-8311-712-1
Parent Title (English):Proceedings of the 8th Int. Workshop on Treebanks and Linguistic Theories
Publisher:EDUCatt
Place of publication:Milano
Document Type:Conference Proceeding
Language:English
Year of first Publication:2009
Date of Publication (online):2016/09/29
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Disambiguation; Natural Language Processing; SALSA corpus
First Page:161
Last Page:172
DDC classes:400 Sprache / 410 Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (German):License LogoUrheberrechtlich gesch√ľtzt