OPUS 4 | Search

Practice Report. A blended learning approach to teaching NLP for a DH public (2017)

This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most of these students are rather not familiar with computer science or, in the case of IIM students, linguistics, we see them as comparable with students of the humanities. We follow a blended learning strategy with lectures, online materials, tutorials, and screencasts. In the first two terms, we focus on linguistics and its formalisation, NLP tools and applications are then introduced from the third term on. The lectures are combined with tutorials and - since the summer term 2017 - with a set of screencasts.

The two sides of prediction error in reading: on the relationship between eye movements and the N400 in sentence processing (2017)

Kretzschmar, Franziska ; Alday, Phillip M.

Taking typography to experimental testing: On the influence of serifs, fonts and justification on eye movements in text reading (2017)

Jarosch, Julian ; Schlesewsky, Matthias ; Füssel, Stephan ; Kretzschmar, Franziska

Typography and individual experience in digital reading: Do readers’ eye movements adapt to poor justification? (2017)

Jarosch, Julian ; Schlesewsky, Matthias ; Füssel, Stephan ; Kretzschmar, Franziska

When readers pay attention to the left: A concurrent eyetracking-fMRI investigation on the neuronal correlates of regressive eye movements during reading (2017)

Weiß, Anna Fiona ; Kretzschmar, Franziska ; Nagels, Arne ; Schlesewsky, Matthias ; Bornkessel-Schlesewsky, Ina ; Tune, Sarah

Evaluating the Morphological Compositionality of Polarity (2017)

Ruppenhofer, Josef ; Steiner, Petra ; Wiegand, Michael

Unknown words are a challenge for any NLP task, including sentiment analysis. Here, we evaluate the extent to which sentiment polarity of complex words can be predicted based on their morphological make-up. We do this on German as it has very productive processes of derivation and compounding and many German hapax words, which are likely to bear sentiment, are morphologically complex. We present results of supervised classification experiments on new datasets with morphological parses and polarity annotations.

Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features (2017)

Schulder, Marc ; Wiegand, Michael ; Ruppenhofer, Josef ; Roth, Benjamin

We present a major step towards the creation of the first high-coverage lexicon of polarity shifters. In this work, we bootstrap a lexicon of verbs by exploiting various linguistic features. Polarity shifters, such as ‘abandon’, are similar to negations (e.g. ‘not’) in that they move the polarity of a phrase towards its inverse, as in ‘abandon all hope’. While there exist lists of negation words, creating comprehensive lists of polarity shifters is far more challenging due to their sheer number. On a sample of manually annotated verbs we examine a variety of linguistic features for this task. Then we build a supervised classifier to increase coverage. We show that this approach drastically reduces the annotation effort while ensuring a high-precision lexicon. We also show that our acquired knowledge of verbal polarity shifters improves phrase-level sentiment analysis.

A Survey on Hate Speech Detection using Natural Language Processing (2017)

Schmidt, Anna ; Wiegand, Michael

This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.

What do we need to know about an unknown word when parsing German (2017)

Do, Bich-Ngoc ; Rehbein, Ines ; Frank, Anette

We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.

Universal Dependencies are hard to parse – or are they? (2017)

Rehbein, Ines ; Steen, Julius ; Do, Bich-Ngoc ; Frank, Anette

Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

35 search hits