Refine
Document Type
- Part of a Book (7)
- Article (1)
Has Fulltext
- yes (8)
Keywords
- Computerlinguistik (3)
- Annotation (2)
- Deutsch (2)
- Diskursanalyse (2)
- E-Learning (2)
- Texttechnologie (2)
- Automatische Sprachanalyse (1)
- Computational linguistics (1)
- Computerunterstütztes Lernen (1)
- Datenstruktur (1)
Publicationstate
- Postprint (8) (remove)
Reviewstate
- (Verlags)-Lektorat (8) (remove)
Publisher
- Springer (2)
- Benjamins (1)
- Buske (1)
- Deutsche Hochschulverband (DHV) (1)
- Deutscher Universitätsverlag (1)
- Narr (1)
- Springer-Verlag (1)
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.
In dependenzsyntaktischen Systemen wie denen von Engel (1982), Hudson (1984), Schubert (1987), Mel'čuk (1988) oder Starosta (1988) können gemeinhin nur Wörter andere Wörter oder Phrasen regieren. Auch wenn diese Annahme durchaus praktikabel ist, führt sie doch zu einer ganzen Reihe von syntaxtheoretischen Unzulänglichkeiten, die ausgearbeitete Dependenzgrammatiken gegenüber konkurrierenden Grammatiktheorien als unzulänglich erscheinen lassen. Ziel des vorliegenden Beitrages ist es, die Notwendigkeit darzulegen, auch komplexeren Einheiten Rektionsfähigkeit zuzugestehen, und mit dem Konzept des 'komplexen Elements' ein geeignetes formales Instrument dafür zur Verfügung zu stellen.
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific joumal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed Systems with a similar task.
This study examines what kind of cues and constraints for discourse interpretation can be derived from the logical and generic document structure of complex texts by the example of scientific journal articles. We performed statistical analysis on a corpus of scientific articles annotated on different annotations layers within the framework of XML-based multi-layer annotation. We introduce different discourse segment types that constrain the textual domains in which to identify rhetorical relation spans, and we show how a canonical sequence of text type structure categories is derived from the corpus annotations. Finally, we demonstrate how and which text type structure categories assigned to complex discourse segments of the type “block” statistically constrain the occurrence of rhetorical relation types.
Dieser Beitrag skizziert die Möglichkeiten, die die Extensible Markup Language (XML) im Umfeld von eLearning und Web Based Training (WBT) eröffnet. Bisherige eLearning-Angebote kranken an verschiedenen Problemen, die durch die Verwendung von XML-basierten Learning Objects vermieden werden können. Ausgehend vom aktuellen Stand im Projekt MiLCA - Medienintensive Lehrmodule in der Computerlinguistik-Ausbildung - soll zudem ein Ausblick auf zukünftige technische Möglichkeiten des Computer-gestützten Lernens gegeben werden.