Refine
Year of publication
Document Type
- Conference Proceeding (44)
- Part of a Book (14)
- Article (6)
Language
- English (64)
Has Fulltext
- yes (64)
Keywords
- Automatische Sprachanalyse (17)
- Deutsch (16)
- Annotation (11)
- Korpus <Linguistik> (11)
- Semantische Analyse (11)
- Computerlinguistik (10)
- Beleidigung (9)
- Frame-Semantik (8)
- Natürliche Sprache (8)
- sentiment analysis (8)
- Social Media (7)
- Automatische Sprachverarbeitung (6)
- Gesprochene Sprache (6)
- Automatische Spracherkennung (5)
- Polarität (5)
- Beschimpfung (4)
- Maschinelles Lernen (4)
- Negation (4)
- Propositionale Einstellung (4)
- Sentimentanalyse (4)
- abusive language (4)
- Datensatz (3)
- Englisch (3)
- German (3)
- Name (3)
- Parlamentsdebatte (3)
- Text Mining (3)
- Argumentstruktur (2)
- Automatische Textanalyse (2)
- Deutschland. Deutscher Bundestag (2)
- FrameNet (2)
- Information Extraction (2)
- Klassifikation (2)
- Lexicon (2)
- Lexikalische Semantik (2)
- Lexikon (2)
- Natural Language Processing (2)
- Negativer Polaritätsausdruck (2)
- Personalpronomen (2)
- Politische Sprache (2)
- Schimpfwort (2)
- Segmentierung (2)
- Semasiologie (2)
- Sentiment Analysis (2)
- Strukturbaum (2)
- Twitter (2)
- Twitter <Softwareplattform> (2)
- UGC (2)
- Universal Dependencies (2)
- Verb (2)
- Web (2)
- annotation guidelines (2)
- lexical semantics (2)
- semantic similarity (2)
- Active Learning (1)
- Active learning (1)
- Adjektiv (1)
- Affirmativer Polaritätsausdruck (1)
- Affixoid (1)
- Akademischer Grad (1)
- Ambiguität (1)
- Angewandte Linguistik (1)
- Annotation guidelines (1)
- Annotation of causal language (1)
- Benennung (1)
- British National Corpus (1)
- CELEX (1)
- Computerunterstützte Kommunikation (1)
- Crowdsourcing (1)
- Data Augmentation (1)
- Data Mining (1)
- Datenbanksystem (1)
- Dependenzgrammatik (1)
- Deutsches Referenzkorpus (DeReKo) (1)
- Disambiguation (1)
- Disambiguierung (1)
- Dokumentverarbeitung (1)
- Effects (1)
- Eigengruppe (1)
- Fehleranalyse (1)
- Formale Semantik (1)
- Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) (1)
- Forschungsdaten (1)
- Frame semantics (1)
- Framing-Effekt (1)
- Fremdgruppe (1)
- Graphisches Symbol (1)
- Integer Linear Program (1)
- Kausalität (1)
- Kompositum (1)
- Kompositum <Wortbildung> (1)
- Konstruktionsgrammatik (1)
- Kontrastive Linguistik (1)
- Kontrastive Syntax (1)
- Labeling approach (1)
- Lexical Database (1)
- Lexical Semantics (1)
- MLSA (1)
- Machine learning (1)
- Modalverb (1)
- Morphemanalyse (1)
- Morphologie (1)
- Named Entity Recognition (1)
- Naming (1)
- Natural language processing (1)
- Nominalphrase (1)
- Opinion Inference (1)
- Opinion Mining (1)
- Oral history (1)
- Parser (1)
- Part-of-Speech-Tagging (1)
- Polarity Shifter (1)
- Polaritätsprofil (1)
- Politik (1)
- Politiker (1)
- Politische Rede (1)
- Pronomen (1)
- Prädikatives Adjektiv (1)
- Präsident (1)
- Rhetorik (1)
- SALSA (1)
- SALSA corpus (1)
- Satz (1)
- Satzende (1)
- Schriftsprache (1)
- SemEval (1)
- Semantics (1)
- Semantik (1)
- Semantisches Netz (1)
- SentiFrameNet (1)
- Sign-Based Construction Grammar (1)
- Smiley (1)
- Soziale Software (1)
- Sprachanalyse (1)
- Sprachgebrauch (1)
- Sprachstatistik (1)
- Sprachtypologie (1)
- Subjectivity (1)
- Supervised Classification (1)
- Syntaktische Analyse (1)
- Textlinguistik (1)
- Thematische Relation (1)
- Titling (1)
- Treebanks (1)
- Tweet (1)
- User Generated Content (1)
- Valences (1)
- Valenz <Linguistik> (1)
- Verbalagression (1)
- Vergleich <Rhetorik> (1)
- WSD (1)
- Wissensextration (1)
- World Wide Web (1)
- Worthäufigkeit (1)
- Wortliste (1)
- abusive comparisons (1)
- abusive emojis (1)
- abusive remarks (1)
- abusive words (1)
- adjectives (1)
- ambiguous words (1)
- arbitrary scripts (1)
- argument structure (1)
- causal tagger (1)
- clusivity (1)
- corpus creation (1)
- deep-level morphological analyses (1)
- derivation (1)
- diary omission (1)
- distributional semantics (1)
- first person plural pronouns (1)
- framing (1)
- fuck (1)
- global structural information (1)
- gold standard corpus (1)
- gradable adjectives (1)
- identity groups (1)
- implicit abuse (1)
- implicitly abusive comparisons (1)
- implicitly abusive language (1)
- instructional imteratives (1)
- lexicography (1)
- lexicon (1)
- lexicon generation (1)
- modality (1)
- morphological analyses (1)
- morphology (1)
- naming (1)
- negation content words (1)
- negation modeling (1)
- null complementation (1)
- oral history corpora (1)
- polarity sensitive items (1)
- polarity shifter (1)
- political text analysis (1)
- predicative adjectives (1)
- pronouns (1)
- rhetorical device (1)
- scalar rhetoric (1)
- selection of textual sources (1)
- semantic role labeling (1)
- semantische Analyse (1)
- sentence boundary detection (1)
- sentiment (1)
- sentiment polarity (1)
- separation of adjectives (1)
- spoken language (1)
- spoken language transcripts (1)
- spoken vs. written (1)
- stance (1)
- subjectivity (1)
- treebank (1)
- treebanks (1)
- wir (1)
- word embeddings (1)
- word structure (1)
- word trees (1)
- word-sense disambiguation (1)
Publicationstate
- Veröffentlichungsversion (54)
- Zweitveröffentlichung (8)
- Postprint (2)
Reviewstate
- Peer-Review (64) (remove)
Publisher
- Association for Computational Linguistics (10)
- European Language Resources Association (9)
- The Association for Computational Linguistics (7)
- German Society for Computational Linguistics & Language Technology und Friedrich-Alexander-Universität Erlangen-Nürnberg (4)
- European Language Resources Association (ELRA) (3)
- European language resources association (ELRA) (3)
- Springer (2)
- Asian Federation of Natural Language Processing (1)
- Austrian Academy of Sciences (1)
- Austrian academy of sciences (1)
We address the task of distinguishing implicitly abusive sentences on identity groups (“Muslims contaminate our planet”) from other group-related negative polar sentences (“Muslims despise terrorism”). Implicitly abusive language are utterances not conveyed by abusive words (e.g. “bimbo” or “scum”). So far, the detection of such utterances could not be properly addressed since existing datasets displaying a high degree of implicit abuse are fairly biased. Following the recently-proposed strategy to solve implicit abuse by separately addressing its different subtypes, we present a new focused and less biased dataset that consists of the subtype of atomic negative sentences about identity groups. For that task, we model components that each address one facet of such implicit abuse, i.e. depiction as perpetrators, aspectual classification and non-conformist views. The approach generalizes across different identity groups and languages.
This paper presents a compositional annotation scheme to capture the clusivity properties of personal pronouns in context, that is their ability to construct and manage in-groups and out-groups by including/excluding the audience and/or non-speech act participants in reference to groups that also include the speaker. We apply and test our schema on pronoun instances in speeches taken from the German parliament. The speeches cover a time period from 2017-2021 and comprise manual annotations for 3,126 sentences. We achieve high inter-annotator agreement for our new schema, with a Cohen’s κ in the range of 89.7-93.2 and a percentage agreement of > 96%. Our exploratory analysis of in/exclusive pronoun use in the parliamentary setting provides some face validity for our new schema. Finally, we present baseline experiments for automatically predicting clusivity in political debates, with promising results for many referential constellations, yielding an overall 84.9% micro F1 for all pronouns.
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates
(2021)
This paper investigates the use of first person plural pronouns as a rhetorical device in political speeches. We present an annotation schema for disambiguating pronoun references and use our schema to create an annotated corpus of debates from the German Bundestag. We then use our corpus to learn to automatically resolve pronoun referents in parliamentary debates. We explore the use of data augmentation with weak supervision to further expand our corpus and report preliminary results.
Implicitly abusive language – What does it actually look like and why are we not getting there?
(2021)
Abusive language detection is an emerging field in natural language processing which has received a large amount of attention recently. Still the success of automatic detection is limited. Particularly, the detection of implicitly abusive language, i.e. abusive language that is not conveyed by abusive words (e.g. dumbass or scum), is not working well. In this position paper, we explain why existing datasets make learning implicit abuse difficult and what needs to be changed in the design of such datasets. Arguing for a divide-and-conquer strategy, we present a list of subtypes of implicitly abusive language and formulate research tasks and questions for future research.
We propose to use abusive emojis, such as the “middle finger” or “face vomiting”, as a proxy for learning a lexicon of abusive words. Since it represents extralinguistic information, a single emoji can co-occur with different forms of explicitly abusive utterances. We show that our approach generates a lexicon that offers the same performance in cross-domain classification of abusive microposts as the most advanced lexicon induction method. Such an approach, in contrast, is dependent on manually annotated seed words and expensive lexical resources for bootstrapping (e.g. WordNet). We demonstrate that the same emojis can also be effectively used in languages other than English. Finally, we also show that emojis can be exploited for classifying mentions of ambiguous words, such as “fuck” and “bitch”, into generally abusive and just profane usages.
We examine the task of detecting implicitly abusive comparisons (e.g. “Your hair looks like you have been electrocuted”). Implicitly abusive comparisons are abusive comparisons in which abusive words (e.g. “dumbass” or “scum”) are absent. We detail the process of creating a novel dataset for this task via crowdsourcing that includes several measures to obtain a sufficiently representative and unbiased set of comparisons. We also present classification experiments that include a range of linguistic features that help us better understand the mechanisms underlying abusive comparisons.
I’ve got a construction looks funny – representing and recovering non-standard constructions in UD
(2020)
The UD framework defines guidelines for a crosslingual syntactic analysis in the framework of dependency grammar, with the aim of providing a consistent treatment across languages that not only supports multilingual NLP applications but also facilitates typological studies. Until now, the UD framework has mostly focussed on bilexical grammatical relations. In the paper, we propose to add a constructional perspective and discuss several examples of spoken-language constructions that occur in multiple languages and challenge the current use of basic and enhanced UD relations. The examples include cases where the surface relations are deceptive, and syntactic amalgams that either involve unconnected subtrees or structures with multiply-headed dependents. We argue that a unified treatment of constructions across languages will increase the consistency of the UD annotations and thus the quality of the treebanks for linguistic analysis.