400 Sprache
Refine
Document Type
- Article (1)
- Part of a Book (1)
- Conference Proceeding (1)
Has Fulltext
- yes (3)
Keywords
- Grammatik (2)
- Grammis (1)
- IR (1)
- Information Retrieval (1)
- Informationsmanagement (1)
- Informationssystem (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Online-Ressource (1)
- Semantische Analyse (1)
- Terminologie (1)
Publicationstate
- Zweitveröffentlichung (2)
- Preprint (1)
Reviewstate
- (Verlags)-Lektorat (1)
- Peer-Review (1)
Publisher
- Acta Press (1)
- De Gruyter (1)
- Springer (1)
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
Grammis is a web-based information system on German grammar, hosted by the Institute for the German Language (IDS). It is human-oriented and features different theoretical perspectives on grammar. Currently, the terminology component of grammis is being redesigned for this theoretical diversity to play a more prominent role in the data model. This also opens opportunities for implementing some machine-oriented features. In this paper, we present the re-design of both data model and knowledge base. We explore how the addition of machine-oriented features to the data model impacts the knowledge base; in particular, how this addition shifts some of the textual complexity into the data model. We show that our resource can easily be ported to a SKOS-XL representation, which makes it available for data science, knowledge-based NLP applications, and LOD in the context of digital humanities.