Refine
Year of publication
- 2020 (2) (remove)
Document Type
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Computerlinguistik (2)
- API (1)
- Data Science (1)
- Datenanalyse (1)
- Experiment (1)
- Information Extraction (1)
- Neurolinguistisches Programmieren (1)
- Python <Programmiersprache> (1)
- Qualitative Inhaltsanalyse (1)
- SABIO-RK (1)
Publicationstate
Reviewstate
- Peer-Review (2)
Publisher
We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.
pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science methods. While pyMMAX2 is pure Python, and most functionality is implemented from scratch, the API re-uses the complex implementation of the essential business logic for MMAX2 annotation schemes by interfacing with the original MMAX2 Java libraries. pyMMAX2 is available for download at http://github.com/nlpAThits/pyMMAX2.