OPUS 4 | 400 Sprache, Linguistik

400 Sprache, Linguistik

400 Sprache (135)
401 Sprachphilosophie, Sprachtheorie (2)
402 Verschiedenes
403 Wörterbücher, Enzyklopädien
404 Spezielle Themen (1)
405 Fortlaufende Sammelwerke
406 Organisationen, Management
407 Ausbildung, Forschung, verwandte Themen (1)
408 Behandlung nach Personengruppen
409 Geografische, personenbezogene Behandlung

2 search hits

1 to 2

Sort by

Authorship attribution with convolutional neural networks and POS-eliding (2017)

Hitschler, Julian ; van den Berg, Esther ; Rehbein, Ines

We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.

A harmonised testsuite for POS tagging of German social media data (2018)

Rehbein, Ines ; Ruppenhofer, Josef ; Zimmermann, Victor

We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.

1 to 2

Open Access

400 Sprache, Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

2 search hits