Volltext-Downloads (blau) und Frontdoor-Views (grau)

Language Independent Named Entity Recognition using Distant Supervision

  • While good results have been achieved for named entity recognition (NER) in supervised settings, it remains a problem that for low resource languages and less studied domains little or no labelled data is available. As NER is a crucial preprocessing step for many natural language processing tasks, finding a way to overcome this deficit in data remains of great interest. We propose a distant supervision approach to NER that is both language and domain independent where we automatically generate labelled training data using gazetteers that we previously extracted from Wikipedia. We test our approach on English, German and Estonian data sets and contribute further by introducing several successful methods to reduce the noise in the generated training data. The tested models beat baseline systems and our results show that distant supervision can be a promising approach for NER when no labelled data is available. For the English model we also show that the distant supervision model is better at generalizing within the same domain of news texts by comparing it against a supervised model on a different test set.

Export metadata

Additional Services

Search Google Scholar


Author:Julia Dembowski, Michael WiegandGND, Dietrich Klakow
Parent Title (German):Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of the 8th Language & Technology Conference, November 17-19, 2017, Poznań, Poland
Publisher:Fundacja Uniwersytetu im. Adama Mickiewicza
Place of publication:Poznań
Editor:Zygmunt Vetulani, Patrick Paroubek
Document Type:Conference Proceeding
Year of first Publication:2017
Date of Publication (online):2019/03/19
GND Keyword:Computerlinguistik; Information Extraction; Maschinelles Lernen; Name; Text Mining
First Page:68
Last Page:72
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Licence (German):License LogoUrheberrechtlich geschützt