Inducing a Lexicon of Abusive Words – a Feature-Based Approach
- We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the crossdomain detection of abusive microposts.
| Author: | Michael WiegandGND, Josef RuppenhoferGND, Anna Schmidt, Clayton Greenberg |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-84719 |
| URL: | https://aclanthology.info/papers/N18-1095/n18-1095 |
| DOI: | https://doi.org/10.18653/v1/N18-1095 |
| ISBN: | 978-1-948087-27-8 |
| Parent Title (English): | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 1-June 6, 2018, New Orleans, Louisiana, Volume 1 (Long Papers) |
| Publisher: | Association for Computational Linguistics |
| Place of publication: | Stroudsburg, PA |
| Document Type: | Conference Proceeding |
| Language: | English |
| Year of first Publication: | 2018 |
| Date of Publication (online): | 2019/02/06 |
| Creating Corporation: | Association for Computational Linguistics |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| GND Keyword: | Beleidigung; Computerlinguistik; Natürliche Sprache; Text Mining |
| First Page: | 1046 |
| Last Page: | 1056 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Leibniz-Classification: | Sprache, Linguistik |
| Linguistics-Classification: | Computerlinguistik |
| Program areas: | Pragmatik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (English): | Creative Commons - Attribution 4.0 International |


