A hybrid approach to statistical and semantical analysis of web documents
- This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
| Author: | Roman SchneiderORCiDGND, Thomas Gottron |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-39947 |
| ISBN: | 978-0-88986-801-4 |
| Parent Title (German): | Internet and Multimedia Systems and Applications (EuroIMSA 2009) |
| Publisher: | Acta Press |
| Place of publication: | Calgary, AB |
| Editor: | Madjid Merabti |
| Document Type: | Conference Proceeding |
| Language: | German |
| Year of first Publication: | 2009 |
| Date of Publication (online): | 2015/08/17 |
| Publicationstate: | Zweitveröffentlichung |
| Reviewstate: | (Verlags)-Lektorat |
| Tag: | IR; template detection; terminological ontology |
| GND Keyword: | Information Retrieval; Online-Ressource; Semantische Analyse |
| First Page: | 115 |
| Last Page: | 120 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik / 400 Sprache |
| Open Access?: | ja |
| Leibniz-Classification: | Sprache, Linguistik |
| Linguistics-Classification: | Computerlinguistik |
| Licence (German): | Urheberrechtlich geschützt |


