Volltext-Downloads (blau) und Frontdoor-Views (grau)

Word-level alignment of paper documents with their electronic full-text counterparts

  • We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Mark-Christoph MüllerORCiDGND, Sucheta GhoshORCiD, Ulrike WittigORCiDGND, Maja Rey
URN:urn:nbn:de:bsz:mh39-110839
DOI:https://doi.org/10.18653/v1/2021.bionlp-1.19
ISBN:978-1-954085-40-4
Parent Title (English):Proceedings of the 20th Workshop on Biomedical Language Processing. June 11, 2021
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg, Pennsylvania
Editor:Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Document Type:Conference Proceeding
Language:English
Year of first Publication:2021
Date of Publication (online):2022/06/10
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:biomedical language processing; document triage; manual database curation; word-level alignment
GND Keyword:Ausrichten <Technik>; Computerlinguistik; Optische Zeichenerkennung; Volltext; XML
First Page:168
Last Page:179
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (English):License LogoCreative Commons - Attribution 4.0 International