TY - CHAP U1 - Konferenzveröffentlichung A1 - Langer, Hagen A1 - Lüngen, Harald A1 - Bayerl, Petra Saskia T1 - Text type structure and logical document structure T2 - Proceedings of the ACL-workshop on discource annotation N2 - Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information about the parts of a document. Hence we investigate the automatic categorization of text segments of scientific articles with XML markup into 16 topic types from a text type structure schema. A corpus of 47 linguistic articles was provided with XML markup on different annotation layers representing text type structure, logical document structure, and grammatical categories. Six different feature extraction strategies were applied to this corpus and combined in various parametrizations in different classifiers. The aim was to explore the contribution of each type of information, in particular the logical structure features, to the classification accuracy. The results suggest that some of the topic types of our hierarchy are successfully learnable, while the features from the logical structure layer had no particular impact on the results. KW - Computerlinguistik ; Texttypus Y1 - 2004 U6 - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-92 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-92 VL - 2004 ER -