TY - CPAPER U1 - Konferenzveröffentlichung A1 - Schwarz, Pia ED - Luz de Araujo, Pedro Henrique ED - Baumann, Andreas ED - Gromann, Dagmar ED - Krenn, Brigitte ED - Roth, Benjamin ED - Wiegand, Michael T1 - Semiautomatic data generation for academic Named Entity Recognition in German text corpora T2 - Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024). September 10-13, 2024 N2 - An NER model is trained to recognize three types of entities in academic contexts: person, organization, and research area. Training data is generated semiautomatically from newspaper articles with the help of word lists for the individual entity types, an off-the-shelf NE recognizer, and an LLM. Experiments fine-tuning a BERT model with different strategies of post-processing the automatically generated data result in several NER models achieving overall F1 scores of up to 92.45%. KW - Named Entity Recognition KW - Deutsch KW - Korpus KW - Großes Sprachmodell KW - Computerlinguistik KW - data generation KW - text corpora KW - academic Named Entity Recognition KW - named entity KW - BERT model Y1 - 2024 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-128423 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-128423 UR - https://aclanthology.org/2024.konvens-main.20 SP - 173 EP - 181 PB - Association for Computational Linguistics CY - Wien ER -