Refine
Year of publication
- 2019 (2)
Document Type
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- yes (2)
Keywords
- Schimpfwort (2) (remove)
Publicationstate
- Veröffentlichungsversion (2) (remove)
Reviewstate
- Peer-Review (2)
Publisher
We discuss the impact of data bias on abusive language detection. We show that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced. Such biases are most notably observed on datasets that are created by focused sampling instead of random sampling. Datasets with a higher proportion of implicit abuse are more affected than datasets with a lower proportion.
We examine the new task of detecting derogatory compounds (e.g. curry muncher). Derogatory compounds are much more difficult to detect than derogatory unigrams (e.g. idiot) since they are more sparsely represented in lexical resources previously found effective for this task (e.g. Wiktionary). We propose an unsupervised classification approach that incorporates linguistic properties of compounds. It mostly depends on a simple distributional representation. We compare our approach against previously established methods proposed for extracting derogatory unigrams.