Authorship attribution with convolutional neural networks and POS-eliding
- We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.
Author: | Julian Hitschler, Esther van den Berg, Ines Rehbein |
---|---|
URN: | urn:nbn:de:bsz:mh39-80252 |
URL: | http://aclweb.org/anthology/W17-4907 |
DOI: | https://doi.org/10.18653/v1/W17-4907 |
ISBN: | 978-1-945626-99-9 |
Parent Title (English): | Proceedings of the Workshop on Stylistic Variation (EMNLP 2017). September 8, 2017 Copenhagen, Denmark |
Publisher: | The Association for Computational Linguistics |
Place of publication: | Stroudsburg PA, USA |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2017 |
Date of Publication (online): | 2018/10/02 |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Part-of-Speech-Tagging |
GND Keyword: | Autorschaft; Computerlinguistik |
First Page: | 53 |
Last Page: | 28 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Program areas: | Digitale Sprachwissenschaft |
Licence (English): | ![]() |