TY - CHAP U1 - Konferenzveröffentlichung A1 - Ruppenhofer, Josef A1 - Rehbein, Ines A1 - Flinz, Carolina ED - Calzolari, Nicoletta ED - Béchet, Frédéric ED - Blache, Philippe ED - Choukri, Khalid ED - Cieri, Christopher ED - Declerck, Thierry ED - Goggi, Sara ED - Isahara, Hitoshi ED - Maegaard, Bente ED - Mariani, Joseph ED - Mazo, Hélène ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - Fine-grained Named Entity Annotations for German Biographic Interviews T2 - Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), May 11-16, 2020, Palais du Pharo, Marseille, France N2 - We present a fine-grained NER annotations scheme with 30 labels and apply it to German data. Building on the OntoNotes 5.0 NER inventory, our scheme is adapted for a corpus of transcripts of biographic interviews by adding categories for AGE and LAN(guage) and also adding label classes for various numeric and temporal expressions. Applying the scheme to the spoken data as well as a collection of teaser tweets from newspaper sites, we can confirm its generality for both domains, also achieving good inter-annotator agreement. We also show empirically how our inventory relates to the well-established 4-category NER inventory by re-annotating a subset of the GermEval 2014 NER coarse-grained dataset with our fine label inventory. Finally, we use a BERT-based system to establish some baselines for NER tagging on our two new datasets. Global results in in-domain testing are quite high on the two datasets, near what was achieved for the coarse inventory on the CoNLLL2003 data. Cross-domain testing produces much lower results due to the severe domain differences. KW - Named Entity Recognition KW - spoken language KW - German KW - oral history corpora KW - Korpus KW - Gesprochene Sprache KW - Name KW - Annotation KW - Automatische Spracherkennung KW - Oral history Y1 - 2020 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98652 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-98652 UR - http://www.lrec-conf.org/proceedings/lrec2020/index.html#4605 SN - 979-10-95546-34-4 SB - 979-10-95546-34-4 SP - 4605 EP - 4614 PB - European Language Resources Association CY - Paris ER -