Inducing a Lexicon of Abusive Words – a Feature-Based Approach

We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the crossdomain detection of abusive microposts.

Metadaten
Author:	Michael Wiegand GND, Josef Ruppenhofer GND, Anna Schmidt, Clayton Greenberg
URN:	urn:nbn:de:bsz:mh39-84719
URL:	https://aclanthology.info/papers/N18-1095/n18-1095
DOI:	https://doi.org/10.18653/v1/N18-1095
ISBN:	978-1-948087-27-8
Parent Title (English):	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 1-June 6, 2018, New Orleans, Louisiana, Volume 1 (Long Papers)
Publisher:	Association for Computational Linguistics
Place of publication:	Stroudsburg, PA
Document Type:	Conference Proceeding
Language:	English
Year of first Publication:	2018
Date of Publication (online):	2019/02/06
Creating Corporation:	Association for Computational Linguistics
Publicationstate:	Veröffentlichungsversion
Reviewstate:	Peer-Review
GND Keyword:	Beleidigung; Computerlinguistik; Natürliche Sprache; Text Mining
First Page:	1046
Last Page:	1056
DDC classes:	400 Sprache / 400 Sprache, Linguistik
Open Access?:	ja
Leibniz-Classification:	Sprache, Linguistik
Linguistics-Classification:	Computerlinguistik
Program areas:	Pragmatik
Program areas:	Digitale Sprachwissenschaft
Licence (English):	Creative Commons - Attribution 4.0 International

Open Access