OPUS 4 | 430 Deutsch

430 Deutsch

430 Deutsch (130)
431 Schriftsysteme und Phonologie des Deutschen (1)
432 Etymologie des Deutschen (20)
433 Deutsche Wörterbücher (51)
435 Deutsche Grammatik (111)
437 Varianten des Deutschen (121)
438 Gebrauch des Standard-Deutsch (27)
439 Andere germanische Sprachen (40)

7 search hits

1 to 7

Sort by

Extracting domain knowledge from tables of contents (2010)

Knowledge in textual form is always presented as visually and hierarchically structured units of text, which is particularly true in the case of academic texts. One research hypothesis of the ongoing project Knowledge ordering in texts - text structure and structure visualisations as sources of natural ontologies1 is that the textual structure of academic texts effectively mirrors essential parts of the knowledge structure that is built up in the text. The structuring of a modern dissertation thesis (e.g. in the form of an automatically generated table of contents - toes), for example, represents a compromise between requirements of the text type and the methodological and conceptual structure of its subject-matter. The aim of the project is to examine how visual-hierarchical structuring systems are constructed, how knowledge structures are encoded in them, and how they can be exploited to automatically derive ontological knowledge for navigation, archiving, or search tasks. The idea to extract domain concepts and semantic relations mainly from the structural and linguistic information gathered from tables of contents represents a novel approach to ontology learning.

A text-technological approach to automatic discourse analysis of complex texts (2006)

Hilbert, Mirco ; Lobin, Henning ; Bärenfänger, Maja ; Lüngen, Harald ; Puskás, Csilla

This paper describes the development of a relational discourse parsing architecture for text documents of a complex text type, namely scientific articles. To achieve this goal, several different linguistic knowledge sources and auxiliary analyses on different linguistic levels are necessary.

Grammatische Restringierung von Dateninhalten in SGML/XML (1999)

Lobin, Henning

From Open Source to Open Information. Collaborative Methods in Creating XML-based Markup Languages (2000)

Rehm, Georg ; Lobin, Henning

Ontology Extraction for Index Generation (2004)

Gottschalg-Duque, Cláudio ; Lobin, Henning

The administration of electronic publication in the Information Era congregates old and new problems, especially those related with Information Retrieval and Automatic Knowledge Extraction. This article presents an Information Retrieval System that uses Natural Language Processing and Ontology to index collection’s texts. We describe a system that constructs a domain specific ontology, starting from the syntactic and semantic analyses of the texts that compose the collection. First the texts are tokenized, then a robust syntactic analysis is made, subsequently the semantic analysis is accomplished in conformity with a metalanguage of knowledge representation, based on a basic ontology composed of 47 classes. The ontology, automatically extracted, generates richer domain specific knowledge. It propitiates, through its semantic net, the right conditions for the user to find with larger efficiency and agility the terms adapted for the consultation to the texts. A prototype of this system was built and used for the indexation of a collection of 221 electronic texts of Information Science written in Portuguese from Brazil. Instead of being based in statistical theories, we propose a robust Information Retrieval System that uses cognitive theories, allowing a larger efficiency in the answer to the users queries.

A Discourse-structured Blog Corpus for German: Challenges of Compilation and Annotation (2016)

Suarez, Holger Grumt ; Karlova-Bourbonus, Natali ; Lobin, Henning

The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).

Using OWL ontologies in discourse parsing (2007)

Bärenfänger, Maja ; Hilbert, Mirco ; Lobin, Henning ; Lüngen, Harald

1 to 7

Open Access

430 Deutsch

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

7 search hits