Refine
Year of publication
Document Type
- Part of a Book (11)
- Article (8)
- Conference Proceeding (6)
- Book (3)
- Other (3)
Is part of the Bibliography
- no (31) (remove)
Keywords
- Deutsch (17)
- Grammatik (12)
- Computerlinguistik (5)
- Korpus <Linguistik> (4)
- Datenbank (3)
- Grammis (3)
- Informationssystem (3)
- Internet (3)
- grammis (3)
- Computer-Mediated Communication (2)
Publicationstate
- Veröffentlichungsversion (17)
- Zweitveröffentlichung (2)
- Postprint (1)
Reviewstate
- (Verlags)-Lektorat (16)
- (Verlags-)Lektorat (1)
- Peer-Review (1)
- Peer-review (1)
Publisher
- Institut für Deutsche Sprache (7)
- Narr (5)
- Universitätsverlag Rhein-Ruhr (3)
- Springer (2)
- Acta Press (1)
- Berlin [u.a.] (1)
- Bibliothek der Universität Konstanz (1)
- European Language Resources Association (ELRA) (1)
- GSCL (1)
- Halem (1)
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
E-VALBU: Advanced SQL/XML processing of dictionary data using an object-relational XML database
(2008)
Contemporary practical lexicography uses a wide range of advanced technological aids,most prominently database systems for the administration of dictionary content. Since XML has become a de facto standard for the coding of lexicographic articles, integrated markup functionality – such as query, update, or transformation of instances – is of particular importance. Even the multi-channel distribution of dictionary data benefits from powerful XML database services. Exemplified by E-VALBU, the most comprehensive electronic dictionary on German verb valency, we outline an integrated approach for advanced XML storing and processing within an object-relational database, and for a public retrieval frontend using Web Services and AJAX technology.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer
linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) position themselves between orality and literacy, and beyond that provide insight into the impact of “new”, mainly internet-based media on language behaviour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine learning algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
Das Online-Wortschatz-Informationssystem Deutsch (OWID) ist ein digitales Wörterbuchportal des Instituts für Deutsche Sprache. Alle darin zusammengeführten lexikografischen Daten sind auf XML-Basis feingranular strukturiert. Speicherung, Verwaltung und Retrieval dieser Daten übernimmt das Orade-basierte Electronic Dictionary Administration System (EDAS). Der vorliegende Beitrag erläutert die XML-basierte Modellierung der Daten, XML-spezifische Fragen der Speicherung, sowie das Retrieval mit XPath und SQL/XML.
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
The main objective of this article is to describe the current activities at the Mannheim Institute for German Language regarding the implementation of a domain-specific ontology for German grammar. We differentiate ontology bases from ontology management Systems, point out the benefits of database-driven Solutions, and go Step by Step through all phases of the ontology lifecycle. In Order to demonstrate the practical use of our approach, we outline the interface between our ontology and the grammis web Information System, and compare the ontology-based retrieval mechanism with traditional full text search.