Refine
Document Type
- Part of a Book (3)
- Article (1)
- Book (1)
Is part of the Bibliography
- no (5)
Keywords
- XML (2)
- Computerlinguistik (1)
- Datenstruktur (1)
- Discourse parsing (1)
- Discourse relations (1)
- Document structure (1)
- Dokumentenverarbeitung (1)
- Informationsstruktur (1)
- Instanz <Informatik> (1)
- Künstliche Intelligenz (1)
Publicationstate
- Postprint (4)
Reviewstate
- (Verlags)-Lektorat (2)
- Peer-Review (1)
Publisher
- Springer (5) (remove)
Situiertheit
(1993)
Die Extensible Markup Language (XML), eine vereinfachte Version der Standard Generalized Markup Language (SGML), wurde für den Austausch strukturierter Daten im Internet entwickelt. Informationen können damit nicht nur in einem einheitlichen, medienunabhängigen Format strukturiert werden, sondern die Strukturierungsprinzipien selbst sind auch durch ein formales Regelwerk, eine Grammatik, beschreibbar. Erst so werden weitergehende Verarbeitungsprozesse wie geleitete Dateneingaben, Datenkonvertierung, flexibles Navigieren und Viewing der Daten möglich. Neben der elementaren Informationsmodellierung ist mit der Meta-Strukturierung durch sog. Architekturen ein neuer Aspekt hinzugekommen: die objektorientierte Schichtung von Struktur-Grammatiken. Das vorliegende Buch stellt beide Strukturierungstechniken - elementar und architektonisch - erstmalig in zusammenhängender Form dar. Es wendet sich an Leser, die sich detailliert und praxisorientiert mit den Möglichkeiten der SGML-basierten Informationsmodellierung auseinandersetzen wollen.
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific joumal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed Systems with a similar task.
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.