Volltext-Downloads (blau) und Frontdoor-Views (grau)

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain

  • We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Mark-Christoph MüllerORCiDGND, Sucheta GhoshORCiD, Maja Rey, Ulrike WittigORCiDGND, Wolfgang MüllerORCiD, Michael StrubeORCiDGND
URN:urn:nbn:de:bsz:mh39-110854
DOI:https://doi.org/10.18653/v1/2020.sdp-1.9
ISSN:978-1-952148-70-5
Parent Title (English):Proceedings of the First Workshop on Scholarly Document Processing. Online, November 19, 2020
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg, Pennsylvania
Editor:Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
Document Type:Conference Proceeding
Language:English
Year of first Publication:2020
Date of Publication (online):2022/06/10
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:SABIO-RK; automatic processing; document processing; life science; manual information extraction
GND Keyword:Computerlinguistik; Datenanalyse; Experiment; Information Extraction; Qualitative Inhaltsanalyse; Schriftstück
First Page:81
Last Page:90
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (English):License LogoCreative Commons - Attribution 4.0 International