TY - CHAP U1 - Konferenzveröffentlichung A1 - Dembowski, Julia A1 - Wiegand, Michael A1 - Klakow, Dietrich ED - Vetulani, Zygmunt ED - Paroubek, Patrick T1 - Language Independent Named Entity Recognition using Distant Supervision T2 - Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of the 8th Language & Technology Conference, November 17-19, 2017, Poznań, Poland N2 - While good results have been achieved for named entity recognition (NER) in supervised settings, it remains a problem that for low resource languages and less studied domains little or no labelled data is available. As NER is a crucial preprocessing step for many natural language processing tasks, finding a way to overcome this deficit in data remains of great interest. We propose a distant supervision approach to NER that is both language and domain independent where we automatically generate labelled training data using gazetteers that we previously extracted from Wikipedia. We test our approach on English, German and Estonian data sets and contribute further by introducing several successful methods to reduce the noise in the generated training data. The tested models beat baseline systems and our results show that distant supervision can be a promising approach for NER when no labelled data is available. For the English model we also show that the distant supervision model is better at generalizing within the same domain of news texts by comparing it against a supervised model on a different test set. KW - Maschinelles Lernen KW - Information Extraction KW - Computerlinguistik KW - Text Mining KW - Name Y1 - 2017 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-86198 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-86198 UR - http://ltc.amu.edu.pl/a2017/ SN - 978-83-64864-94-0 SB - 978-83-64864-94-0 SP - 68 EP - 72 PB - Fundacja Uniwersytetu im. Adama Mickiewicza CY - Poznań ER -