Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain
- We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.
Author: | Mark-Christoph MüllerORCiDGND, Sucheta GhoshORCiD, Maja Rey, Ulrike WittigORCiDGND, Wolfgang MüllerORCiD, Michael StrubeORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-110854 |
DOI: | https://doi.org/10.18653/v1/2020.sdp-1.9 |
ISSN: | 978-1-952148-70-5 |
Parent Title (English): | Proceedings of the First Workshop on Scholarly Document Processing. Online, November 19, 2020 |
Publisher: | Association for Computational Linguistics |
Place of publication: | Stroudsburg, Pennsylvania |
Editor: | Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2020 |
Date of Publication (online): | 2022/06/10 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | SABIO-RK; automatic processing; document processing; life science; manual information extraction |
GND Keyword: | Computerlinguistik; Datenanalyse; Experiment; Information Extraction; Qualitative Inhaltsanalyse; Schriftstück |
First Page: | 81 |
Last Page: | 90 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (English): | ![]() |