OPUS 4 | Search

8 search hits

1 to 8

Sort by

Authorship attribution with convolutional neural networks and POS-eliding (2017)

Hitschler, Julian ; van den Berg, Esther ; Rehbein, Ines

We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.

CMC Corpora in DeReKo (2017)

Lüngen, Harald ; Kupietz, Marc

We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.

Corpus linguistic exploration of modern proverb use and proverb patterns (2017)

Steyer, Kathrin

In my talk, I present an empirical approach to detecting and describing proverbs as frozen sentences with specific functions in current language use. We have developed this approach in the EU project ‘SprichWort’ (based on the German Reference Corpus). The first chapter illustrates selected aspects of our complex, iterative procedure to validate proverb candidates. Based on our corpus-driven lexpan methodology of slot analysis I then discuss semantic restrictions of proverb patterns. Furthermore, I show different degrees of proverb quality ranging from genuine proverbs to non-proverb realizations of the same abstract pattern. On the one hand, the corpus validation reveals that proverbs are definitely perceived and used as relatively fixed entities and often as sentences. On the other hand, proverbs are not only interpreted as an interesting unique phenomenon but also as part of the whole lexicon, embedded in networks of different lexical items.

Das Verhältnis von Pronomen und Determinativen - eine deutsche Spezialität? (2017)

Ballweg, Joachim

Detecting annotation noise in automatically labelled data (2017)

Rehbein, Ines ; Ruppenhofer, Josef

We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.

Eine Kur für die Ellipse. Zu nicht-kanonischen Äußerungsformen in der IDS-Grammatik (2017)

Zifonun, Gisela

EuReCo - Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research (2017)

Kupietz, Marc ; Witt, Andreas ; Bański, Piotr ; Tufiş, Dan ; Cristea, Dan ; Váradi, Tamás

In this paper we discuss the opportunities, prerequisites, possible applications and implications of a virtually joint corpus based on existing national, reference or other large corpora and their host institutions.

Modeling the properties of German phrasal compounds within a usage-based constructional approach (2017)

Hein, Katrin

This paper discusses phrasal compounds in German (e.g.“Man-muss-doch-überalles-reden-können”-Credo, ‘one-should-be-able-to-talk-about-everything motto’). It provides the first empirically based investigation and description of this wordformation type within the theoretical framework of construction grammar. While phrasal compounds pose a problem for “traditional” generative approaches, I argue that a usage-based constructional model (e.g. Langacker 1987; Goldberg 2006) which takes into consideration aspects of frequency provides a suitable approach to modeling and explaining their properties. For this purpose, a large inventory of phrasal compounds was extracted from the German Reference Corpus (DeReKo) and modeled as pairings of form and meaning at different levels of specificity and abstractness within a bottom-up process. Overall, this paper not only presents a new and original approach to phrasal compounds, but also offers interesting perspectives for dealing with composition in general.

1 to 8

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

8 search hits