Refine
Year of publication
- 2011 (3) (remove)
Document Type
- Part of a Book (3) (remove)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Computerlinguistik (2)
- Datenstruktur (1)
- Deutsch (1)
- Dokumentenverarbeitung (1)
- Maschinelles Lernen (1)
- Natürliche Sprache (1)
- Ontologie <Wissensverarbeitung> (1)
- Parsing (1)
- Semantische Relation (1)
- Semantisches Netz (1)
Reviewstate
Publisher
- Springer (3)
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific joumal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed Systems with a similar task.
In this contribution, we discuss and compare alternative options of modelling the entities and relations of wordnet-like resources in the Web Ontology Language OWL. Based on different modelling options, we developed three models of representing wordnets in OWL, i.e. the instance model, the dass model, and the metaclass model. These OWL models mainly differ with respect to the ontological Status of lexical units (word senses) and the synsets. While in the instance model lexical units and synsets are represented as individuals, in the dass model they are represented as classes; both model types can be encoded in the dialect OWL DL. As a third alternative, we developed a metaclass model in OWL FULL, in which lexical units and synsets are defined as metaclasses, the individuals of which are classes themselves. We apply the three OWL models to each of three wordnet-style resources: (1) a subset of the German wordnet GermaNet, (2) the wordnet-style domain ontology TermNet, and (3) GermaTermNet, in which TermNet technical terms and GermaNet synsets are connected by means of a set of “plug-in” relations. We report on the results of several experiments in which we evaluated the performance of querying and processing these different models: (1) A comparison of all three OWL models (dass, instance, and metaclass model) of TermNet in the context of automatic text-to-hypertext conversion, (2) an investigation of the potential of the GermaTermNet resource by the example of a wordnet-based semantic relatedness calculation.
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.