Refine
Document Type
- Conference Proceeding (7)
- Article (2)
- Part of a Book (2)
- Book (1)
- Part of Periodical (1)
Has Fulltext
- yes (13)
Is part of the Bibliography
- yes (13)
Keywords
- Beleidigung (13) (remove)
Publicationstate
Reviewstate
- Peer-Review (11)
- (Verlags)-Lektorat (2)
Publisher
- Association for Computational Linguistics (4)
- Austrian Academy of Sciences (2)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (2)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Stroudsburg (1)
- The Association for Computational Linguistics (1)
- de Gruyter (1)
We present the pilot edition of the GermEval Shared Task on the Identification of Offensive Language. This shared task deals with the classification of German tweets from Twitter. It comprises two tasks, a coarse-grained binary classification task and a fine-grained multi-class classification task. The shared task had 20 participants submitting 51 runs for the coarse-grained task and 25 runs for the fine-grained task. Since this is a pilot task, we describe the process of extracting the raw-data for the data collection and the annotation schema. We evaluate the results of the systems submitted to the shared task. The shared task homepage can be found at https://projects.cai. fbi.h-da.de/iggsa/
Offensive language in social media is a problem currently widely discussed. Researchers in language technology have started to work on solutions to support the classification of offensive posts. We present the pilot edition of the GermEval Shared Task on the Identification of Offensive Language. This shared task deals with the classification of German tweets from Twitter. GermEval 2018 is the fourth workshop in a series of shared tasks on German processing.
We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the crossdomain detection of abusive microposts.