Refine
Year of publication
- 2010 (2) (remove)
Document Type
- Part of a Book (2)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Annotation (1)
- Discourse parsing (1)
- Discourse relations (1)
- Diskursanalyse (1)
- Document structure (1)
- Generic Document Structure (1)
- Korpus <Linguistik> (1)
- Linguistic annotations (1)
- Logical Document Structure (1)
- Text technology (1)
Publicationstate
- Postprint (2)
Reviewstate
This study examines what kind of cues and constraints for discourse interpretation can be derived from the logical and generic document structure of complex texts by the example of scientific journal articles. We performed statistical analysis on a corpus of scientific articles annotated on different annotations layers within the framework of XML-based multi-layer annotation. We introduce different discourse segment types that constrain the textual domains in which to identify rhetorical relation spans, and we show how a canonical sequence of text type structure categories is derived from the corpus annotations. Finally, we demonstrate how and which text type structure categories assigned to complex discourse segments of the type “block” statistically constrain the occurrence of rhetorical relation types.
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.