Volltext-Downloads (blau) und Frontdoor-Views (grau)

Word-level alignment of paper documents with their electronic full-text counterparts

  • We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Mark-Christoph MüllerORCiDGND, Sucheta GhoshORCiD, Ulrike WittigORCiDGND, Maja Rey
URN:urn:nbn:de:bsz:mh39-110839
DOI:https://doi.org/10.18653/v1/2021.bionlp-1.19
ISBN:978-1-954085-40-4
Parent Title (English):Proceedings of the 20th Workshop on Biomedical Language Processing. June 11, 2021
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg, Pennsylvania
Editor:Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Document Type:Conference Proceeding
Language:English
Year of first Publication:2021
Date of Publication (online):2022/06/10
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:biomedical language processing; document triage; manual database curation; word-level alignment
GND Keyword:Ausrichten <Technik>; Computerlinguistik; Optische Zeichenerkennung; Volltext; XML
First Page:168
Last Page:179
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (English):License LogoCreative Commons - Attribution 4.0 International