Refine
Year of publication
- 2017 (32) (remove)
Document Type
- Conference Proceeding (16)
- Article (14)
- Part of a Book (2)
Has Fulltext
- yes (32)
Is part of the Bibliography
- no (32) (remove)
Keywords
- Korpus <Linguistik> (10)
- Corpus linguistics (8)
- Deutsch (5)
- Augenfolgebewegung (4)
- Blickbewegung (4)
- Corpus technology (4)
- Experimentelle Psychologie (4)
- Texttechnologie (4)
- Datenmanagement (3)
- Englisch (3)
Publicationstate
- Veröffentlichungsversion (32) (remove)
Reviewstate
- Peer-Review (32) (remove)
Publisher
This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.
The Manatee corpus management system on which the Sketch Engine is built is efficient, but unable to harness the power of today’s multiprocessor machines. We describe a new, compatible implementation of Manatee which we develop in the Go language and report on the performance gains that we obtained.
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ― inferior, but rather ― different.
We present an event-related potentials (ERP) study that addresses the question of how pieces of information pertaining to semantic roles and event structure interact with each other and with the verb’s meaning. Specifically, our study investigates German verb-final clauses with verbs of motion such as fliegen ‘fly’ and schweben ‘float, hover,’ which are indeterminate with respect to agentivity and event structure. Agentivity was tested by manipulating the animacy of the subject noun phrase and event structure by selecting a goal adverbial, which makes the event telic, or a locative adverbial, which leads to an atelic reading. On the clause-initial subject, inanimates evoked an N400 effect vis-à-vis animates. On the adverbial phrase in the atelic (locative) condition, inanimates showed an N400 in comparison to animates. The telic (goal) condition exhibited a similar amplitude like the inanimate-atelic condition. Finally, at the verbal lexeme, the inanimate condition elicited an N400 effect against the animate condition in the telic (goal) contexts. In the atelic (locative) condition, items with animates evoked an N400 effect compared to inanimates. The combined set of findings suggest that clause-initial animacy is not sufficient for agent identification in German, which seems to be completed only at the verbal lexeme in our experiment. Here non-agents (inanimates) changing their location in a goal-directed way and agents (animates) lacking this property are dispreferred and this challenges the assumption that change of (locational) state is generally a defining characteristic of the patient role. Besides this main finding that sheds new light on role prototypicality, our data seem to indicate effects that, in our view, are related to complexity, i.e., minimality. Inanimate subjects or goal arguments increase processing costs since they have role or event structure restrictions that animate subjects or locative modifiers lack.
Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh)
(2017)
CorCenCC is an interdisciplinary and multiinstitutional project that is creating a large-scale, open-source corpus of contemporary Welsh. CorCenCC will be the first ever large-scale corpus to represent spoken, written and electronicallymediated Welsh (compiling an initial data set of 10 million Welsh words), with a functional design informed, from the outset, by representatives of all anticipated academic and community user groups.
In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of domain and genre properties, and it is by no means clear what impact each of those has on statistical parsing. In this paper, we investigate how differences between articles in a newspaper corpus relate to the concepts of genre and domain and how they influence parsing performance of a transition-based dependency parser. We do this by applying various similarity measures for data point selection and testing their adequacy for creating genre-aware parsing models.