A hybrid approach to statistical and semantical analysis of web documents
- This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
Author: | Roman SchneiderGND, Thomas Gottron |
---|---|
URN: | urn:nbn:de:bsz:mh39-39947 |
ISBN: | 978-0-88986-801-4 |
Parent Title (German): | Internet and Multimedia Systems and Applications (EuroIMSA 2009) |
Publisher: | Acta Press |
Place of publication: | Calgary, AB |
Editor: | Madjid Merabti |
Document Type: | Conference Proceeding |
Language: | German |
Year of first Publication: | 2009 |
Date of Publication (online): | 2015/08/17 |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | (Verlags)-Lektorat |
Tag: | IR; template detection; terminological ontology |
GND Keyword: | Information Retrieval; Online-Ressource; Semantische Analyse |
First Page: | 115 |
Last Page: | 120 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik / 400 Sprache |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | Urheberrechtlich geschützt |