Refine
Year of publication
- 2017 (35) (remove)
Document Type
- Part of a Book (35) (remove)
Has Fulltext
- yes (35)
Keywords
- Deutsch (16)
- Korpus <Linguistik> (8)
- Diskursmarker (6)
- Gesprochene Sprache (5)
- Adjektiv (2)
- Annotation (2)
- Anonymisierung (2)
- Computerlinguistik (2)
- Corpus linguistics (2)
- Grammatik (2)
Publicationstate
- Veröffentlichungsversion (35) (remove)
Reviewstate
- (Verlags)-Lektorat (24)
- Peer-Review (8)
- Peer-review (2)
- Peer Review (1)
Publisher
- Verlag für Gesprächsforschung (6)
- Narr (4)
- Synchron (4)
- Narr Francke Attempto (3)
- de Gruyter (3)
- Institut für Deutsche Sprache (2)
- Stauffenburg (2)
- The Association for Computational Linguistics (2)
- Adam-Ries-Bund e.V. (1)
- De Gruyter (1)
Verstehen und Motivieren: semantische Fluchtpunkte deutscher und italienischer Lexeme mit -log-
(2017)
We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict the authors of scientific publications in the same discipline at levels well above chance.
We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.