TY - CHAP U1 - Konferenzveröffentlichung A1 - Müller, Mark-Christoph A1 - Ghosh, Sucheta A1 - Wittig, Ulrike A1 - Rey, Maja ED - Demner-Fushman, Dina ED - Cohen, Kevin Bretonnel ED - Ananiadou, Sophia ED - Tsujii, Junichi T1 - Word-level alignment of paper documents with their electronic full-text counterparts T2 - Proceedings of the 20th Workshop on Biomedical Language Processing. June 11, 2021 N2 - We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR. KW - Computerlinguistik KW - Volltext KW - Optische Zeichenerkennung KW - XML KW - Ausrichten KW - biomedical language processing KW - word-level alignment KW - manual database curation KW - document triage Y1 - 2021 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-110839 SN - 978-1-954085-40-4 SB - 978-1-954085-40-4 U6 - https://doi.org/10.18653/v1/2021.bionlp-1.19 DO - https://doi.org/10.18653/v1/2021.bionlp-1.19 SP - 168 EP - 179 PB - Association for Computational Linguistics CY - Stroudsburg, Pennsylvania ER -