OPUS 4 | Search

A hybrid approach to statistical and semantical analysis of web documents (2009)

This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.

On valence-binding grammars (1978)

Frosch, Helmut

The valence of a verb determines the number, and the syntactic class, of those expressions that must co-occur with it in a sentence. Definitions of "valence-term" and "valence-boundness" are provided whereby the precise conditions are formulated that a valence-binding grammar must satisfy. These conditions are exemplified in the framework of a simple categorial grammar, in which various reductions of the general notions can be carried out.

Satzklammer, Informationsklammer und geteilte Nominalphrasen (2006)

Ballweg, Joachim

Der Beitrag befasst sich zunächst mit der Satzklammer des Deutschen aus der Perspektive der Informationsverteilung. Nachdem gezeigt ist, dass sie als Informationsklammer fungiert, wird ihre Interaktion mit den Teilen gespaltener Nominalphrasen untersucht. Dabei zeigen sich zwei interessante Befunde: • die Satzklammer und die NP-Teile unterstützen sich bei der Informationsklammerbildung; insbesondere können die Spalt-NP-Teile Akzent tragen; • die Spalt-NP-Teile können alleine die Rolle einer Informationsklammer spielen, wodurch eine Topikalisierung des Partizips II möglich wird.

Der Schikoree im Frigidär: Die Neuregelung der deutschen Rechtschreibung (1995)

Augst, Gerhard ; Heller, Klaus

The Naproche Project. Controlled Natural Language Proof Checking of Mathematical Texts (2010)

Cramer, Marcos ; Fisseni, Bernhard ; Koepke, Peter ; Kühlwein, Daniel ; Schröder, Bernhard ; Veldman, Jip

This paper discusses the semi-formal language of mathematics and presents the Naproche CNL, a controlled natural language for mathematical authoring. Proof Representation Structures, an adaptation of Discourse Representation Structures, are used to represent the semantics of texts written in the Naproche CNL. We discuss how the Naproche CNL can be used in formal mathematics, and present our prototypical Naproche system, a computer program for parsing texts in the Naproche CNL and checking the proofs in them for logical correctness.

Herstellen von Dominanz im Gespräch durch Dominantsetzen von Perspektiven (1999)

Keim, Inken

Das Konzept von Dominanz bezieht sich auf soziale Beziehungen, die entweder auf bereits etablierten Machtverhältnissen basieren oder solche herzustellen versuchen. Dominanz im Gespräch kann sich in bestimmten Interaktionseigenschaften manifestieren, z.B. in der ständigen Beanspruchung von Rederecht, der konsistenten thematischen und perspektivischen Steuerung, der Kontrolle von Partneraktivitäten oder dem Verhindern von Initiativen anderer u.ä.. Im Folgenden werde ich mich auf eine der Möglichkeiten konzentrieren, auf das Herstellen von Dominanz durch das Dominantsetzen von Perspektiven. Durch das konsistente Dominantsetzen der eigenen Perspektive auf einen thematischen Gegenstand oder Aspekte davon ist es möglich, zumindest in Bezug auf diesen Gegenstand Dominanz über die anderen Gesprächspartner zu etablieren.

Towards a syntactically motivated analysis of modifiers in German (2016)

Rehbein, Ines ; Hirschmann, Hagen

The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.

Berührungspunkte zwischen Rechtswissenschaft und Linguistik. Statement zur Arbeit der Arbeitsgruppe 2: Rechtsinteme Begründungsstrukturen (1982)

Wimmer, Rainer

Wissenschaftliche Kommunikation und Alltagskommunikation im Lichte einer linguistisch begründeten Sprachkritik (1982)

Wimmer, Rainer

Wissenschaftliche und didaktische Grammatik. Bericht aus der Arbeitsgruppe 2 (1983)

Wimmer, Rainer

From Proof Texts to Logic. Discourse Representation Structures for Proof Texts in Mathematics (2009)

Veldman, Jip ; Fisseni, Bernhard ; Schröder, Bernhard ; Koepke, Peter

We present an extension to Discourse Representation Theory that can be used to analyze mathematical texts written in the commonly used semi-formal language of mathematics (or at least a subset of it). Moreover, we describe an algorithm that can be used to check the resulting Proof Representation Structures for their logical validity and adequacy as a proof.

ProofML - eine Annotationssprache für natürlichsprachige mathematische Beweise (2005)

Fisseni, Bernhard

Mathematische Texte sind natürlichsprachlich (ggf. mit formelsprachliche Anteilen), nicht formalsprachlich. ProofML ist ein Dateiformat, das erlaubt, sie so zu annotieren, daß der natürlichsprachlichen eine logische Struktur zugeordnet wird.

Something Empirical about Focus (2004)

Fisseni, Bernhard

Interface-Agenten zur Steuerung in komplexen Umgebungen (1993)

Lobin, Henning ; Milde, Jan-Torsten

Integration multimodaler Mensch-Maschine-Kommunikation durch Interface-Agenten (1993)

Lobin, Henning ; Milde, Jan-Torsten ; Gastner, Rainer

Extracting domain knowledge from tables of contents (2010)

Lüngen, Harald ; Lobin, Henning

Knowledge in textual form is always presented as visually and hierarchically structured units of text, which is particularly true in the case of academic texts. One research hypothesis of the ongoing project Knowledge ordering in texts - text structure and structure visualisations as sources of natural ontologies1 is that the textual structure of academic texts effectively mirrors essential parts of the knowledge structure that is built up in the text. The structuring of a modern dissertation thesis (e.g. in the form of an automatically generated table of contents - toes), for example, represents a compromise between requirements of the text type and the methodological and conceptual structure of its subject-matter. The aim of the project is to examine how visual-hierarchical structuring systems are constructed, how knowledge structures are encoded in them, and how they can be exploited to automatically derive ontological knowledge for navigation, archiving, or search tasks. The idea to extract domain concepts and semantic relations mainly from the structural and linguistic information gathered from tables of contents represents a novel approach to ontology learning.

From Open Source to Open Information. Collaborative Methods in Creating XML-based Markup Languages (2000)

Rehm, Georg ; Lobin, Henning

Extending the possibilities for collaborative work with TEI/XML through the usage of a wiki system (2013)

Entrup, Bastian ; Binder, Frank ; Lobin, Henning

This paper presents and discusses an integrated project-specific working environment for editing TEI/XML-files and linking entities of interest to a dedicated wiki system. This working environment has been specifically tailored to the workflow in our interdisciplinary digital humanities project GeoBib. It addresses some challenges that arose while working with person-related data and geographical references in a growing collection of TEI/XML-files. While our current solution provides some essential benefits, we also discuss several critical issues and challenges that remain.

Ontology Extraction for Index Generation (2004)

Gottschalg-Duque, Cláudio ; Lobin, Henning

The administration of electronic publication in the Information Era congregates old and new problems, especially those related with Information Retrieval and Automatic Knowledge Extraction. This article presents an Information Retrieval System that uses Natural Language Processing and Ontology to index collection’s texts. We describe a system that constructs a domain specific ontology, starting from the syntactic and semantic analyses of the texts that compose the collection. First the texts are tokenized, then a robust syntactic analysis is made, subsequently the semantic analysis is accomplished in conformity with a metalanguage of knowledge representation, based on a basic ontology composed of 47 classes. The ontology, automatically extracted, generates richer domain specific knowledge. It propitiates, through its semantic net, the right conditions for the user to find with larger efficiency and agility the terms adapted for the consultation to the texts. A prototype of this system was built and used for the indexation of a collection of 221 electronic texts of Information Science written in Portuguese from Brazil. Instead of being based in statistical theories, we propose a robust Information Retrieval System that uses cognitive theories, allowing a larger efficiency in the answer to the users queries.

A Discourse-structured Blog Corpus for German: Challenges of Compilation and Annotation (2016)

Suarez, Holger Grumt ; Karlova-Bourbonus, Natali ; Lobin, Henning

The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

131 search hits