Volltext-Downloads (blau) und Frontdoor-Views (grau)

Text parsing of a complex genre

  • A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure Theory. The architecture of the parser comprises pre-processing components which provide an input text with XML annotations on different linguistic and structural layers. In the first version these are syntactic tagging, lexical discourse marker tagging, logical document structure, and segmentation into elementary discourse segments. The algorithm is based on the shift-reduce parser by Marcu (2000) and is controlled by reduce operations that are constrained by linguistic conditions derived from an XML-encoded discourse marker lexicon. The constraints are formulated over multiple annotation layers of the same text.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Harald LüngenGND, Maja Baerenfaenger, Mirco Hilbert, Henning LobinGND, Csilla Puskás
URN:urn:nbn:de:bsz:mh39-84
URL:http://www.uni-giessen.de/germanistik/ascl/dfg-projekt/pdfs/242_elpub2006.published-version.pdf
ISBN:978-954-16-0040-5
Parent Title (English):ELPUB 2006. Digital Spectrum: Integrating Technology and Culture - Proceedings of the 10th International Conference on Electronic Publishing held in Bansko. ELPUB 2006, Bansko, Bulgaria, June 14-16
Publisher:Foi-Commerce
Place of publication:Sofia
Editor:Milena Dobreva, Bob Martens
Document Type:Conference Proceeding
Language:English
Year of first Publication:2006
Tag:XML applications; discourse parsing; rhetorical structure; text parsing
GND Keyword:Textanalyse ; Diskursanalyse ; Computerlinguistik
Pagenumber:10
First Page:247
Last Page:256
Dewey Decimal Classification:400 Sprache / 410 Linguistik / 410 Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (German):Es gilt das UrhG