OPUS 4 | Search

Refine

Has Fulltext

yes (6)

6 search hits

1 to 6

Sort by

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain (2020)

Müller, Mark-Christoph ; Ghosh, Sucheta ; Rey, Maja ; Wittig, Ulrike ; Müller, Wolfgang ; Strube, Michael

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.

Transparent, efficient, and robust word embedding access with WOMBAT (2018)

Müller, Mark-Christoph ; Strube, Michael

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

Improving extractive dialogue summarization by utilizing human feedback (2007)

Mieskes, Margot ; Müller, Christoph ; Strube, Michael

Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the system. We deal with this problem by providing users with a GUI which allows them to correct automatically produced imperfect summaries. The corrected summary in turn is added to the pool of training data. The performance of the system is expected to improve as it adapts to the new domain.

Multi-level annotation in MMAX (2003)

Müller, Mark-Christoph ; Strube, Michael

We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility

Applying co-training to reference resolution (2002)

Müller, Mark-Christoph ; Rapp, Stefan ; Strube, Michael

In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.

Annotating anaphoric and bridging relations with MMAX (2001)

Müller, Mark-Christoph ; Strube, Michael

We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an annotation tool and demonstrate its use.

1 to 6

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

6 search hits