OPUS 4 | Search

1 search hit

1 to 1

Robust extraction of marked-up text sections from scientific document printouts (2022)

We present a simple tool for extracting text and markup information from printouts of (not only) scientific documents. While the heavy-lifting OCR is done by off-the-shelf tesseract, our focus is on detection, extraction, and basic categorization of color-highlighted text sections, as well as on providing a framework for downstream processing of extraction results. The tool can be useful for document analysis tasks that must, or benefit from being able to, use printed paper.

1 to 1

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

1 search hit