Refine
Year of publication
- 2022 (1)
Document Type
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- yes (1)
Keywords
- Multi-modality (1) (remove)
Publicationstate
- Veröffentlichungsversion (1) (remove)
Reviewstate
- Peer-Review (1)
Publisher
We present a simple tool for extracting text and markup information from printouts of (not only) scientific documents. While the heavy-lifting OCR is done by off-the-shelf tesseract, our focus is on detection, extraction, and basic categorization of color-highlighted text sections, as well as on providing a framework for downstream processing of extraction results. The tool can be useful for document analysis tasks that must, or benefit from being able to, use printed paper.