Refine
Year of publication
Document Type
- Conference Proceeding (131) (remove)
Has Fulltext
- yes (131)
Is part of the Bibliography
- no (131) (remove)
Keywords
- Korpus <Linguistik> (48)
- Deutsch (36)
- Annotation (14)
- Computerlinguistik (13)
- Auszeichnungssprache (11)
- Computerunterstützte Lexikographie (8)
- Head-driven phrase structure grammar (8)
- Gesprochene Sprache (7)
- XML (7)
- HPSG (6)
Publicationstate
- Veröffentlichungsversion (110)
- Zweitveröffentlichung (18)
- Postprint (3)
Reviewstate
- (Verlags)-Lektorat (131) (remove)
Publisher
- European Language Resources Association (ELRA) (12)
- CSLI Publications (5)
- Nisaba (5)
- Association for Computational Linguistics (4)
- Extreme Markup Languages Conference (4)
- European Language Resources Association (3)
- Evangelische Akademie Loccum (3)
- Ivane Javakhishvili Tbilisi State University (3)
- University of Birmingham (3)
- University of Illinois (3)
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
On valence-binding grammars
(1978)
The valence of a verb determines the number, and the syntactic class, of those expressions that must co-occur with it in a sentence. Definitions of "valence-term" and "valence-boundness" are provided whereby the precise conditions are formulated that a valence-binding grammar must satisfy. These conditions are exemplified in the framework of a simple categorial grammar, in which various reductions of the general notions can be carried out.
Der Beitrag befasst sich zunächst mit der Satzklammer des Deutschen aus der Perspektive der Informationsverteilung. Nachdem gezeigt ist, dass sie als Informationsklammer fungiert, wird ihre Interaktion mit den Teilen gespaltener Nominalphrasen untersucht. Dabei zeigen sich zwei interessante Befunde:
• die Satzklammer und die NP-Teile unterstützen sich bei der Informationsklammerbildung; insbesondere können die Spalt-NP-Teile Akzent tragen;
• die Spalt-NP-Teile können alleine die Rolle einer Informationsklammer spielen, wodurch eine Topikalisierung des Partizips II möglich wird.
This paper discusses the semi-formal language of mathematics and presents the Naproche CNL, a controlled natural language for mathematical authoring. Proof Representation Structures, an adaptation of Discourse Representation Structures, are used to represent the semantics of texts written in the Naproche CNL. We discuss how the Naproche CNL can be used in formal mathematics, and present our prototypical Naproche system, a computer program for parsing texts in the Naproche CNL and checking the proofs in them for logical correctness.
Das Konzept von Dominanz bezieht sich auf soziale Beziehungen, die entweder auf bereits etablierten Machtverhältnissen basieren oder solche herzustellen versuchen. Dominanz im Gespräch kann sich in bestimmten Interaktionseigenschaften manifestieren, z.B. in der ständigen Beanspruchung von Rederecht, der konsistenten thematischen und perspektivischen Steuerung, der Kontrolle von Partneraktivitäten oder dem Verhindern von Initiativen anderer u.ä..
Im Folgenden werde ich mich auf eine der Möglichkeiten konzentrieren, auf das Herstellen von Dominanz durch das Dominantsetzen von Perspektiven. Durch das konsistente Dominantsetzen der eigenen Perspektive auf einen thematischen Gegenstand oder Aspekte davon ist es möglich, zumindest in Bezug auf diesen Gegenstand Dominanz über die anderen Gesprächspartner zu etablieren.
The Stuttgart-Tübingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.
From Proof Texts to Logic. Discourse Representation Structures for Proof Texts in Mathematics
(2009)
We present an extension to Discourse Representation Theory that can be used to analyze mathematical texts written in the commonly used semi-formal language of mathematics (or at least a subset of it). Moreover, we describe an algorithm that can be used to check the resulting Proof Representation Structures for their logical validity and adequacy as a proof.
Knowledge in textual form is always presented as visually and hierarchically structured units of text, which is particularly true in the case of academic texts. One research hypothesis of the ongoing project Knowledge ordering in texts - text structure and structure visualisations as sources of natural ontologies1 is that the textual structure of academic texts effectively mirrors essential parts of the knowledge structure that is built up in the text. The structuring of a modern dissertation thesis (e.g. in the form of an automatically generated table of contents - toes), for example, represents a compromise between requirements of the text type and the methodological and conceptual structure of its subject-matter. The aim of the project is to examine how visual-hierarchical structuring systems are constructed, how knowledge structures are encoded in them, and how they can be exploited to automatically derive ontological knowledge for navigation, archiving, or search tasks. The idea to extract domain concepts and semantic relations mainly from the structural and linguistic information gathered from tables of contents represents a novel approach to ontology learning.
From Open Source to Open Information. Collaborative Methods in Creating XML-based Markup Languages
(2000)
Extending the possibilities for collaborative work with TEI/XML through the usage of a wiki system
(2013)
This paper presents and discusses an integrated project-specific working environment for editing TEI/XML-files and linking entities of interest to a dedicated wiki system. This working environment has been specifically tailored to the workflow in our interdisciplinary digital humanities project GeoBib. It addresses some challenges that arose while working with person-related data and geographical references in a growing collection of TEI/XML-files. While our current solution provides some essential benefits, we also discuss several critical issues and challenges that remain.
The administration of electronic publication in the Information Era congregates old and new problems, especially those related with Information Retrieval and Automatic Knowledge Extraction. This article presents an Information Retrieval System that uses Natural Language Processing and Ontology to index collection’s texts. We describe a system that constructs a domain specific ontology, starting from the syntactic and semantic analyses of the texts that compose the collection. First the texts are tokenized, then a robust syntactic analysis is made, subsequently the semantic analysis is accomplished in conformity with a metalanguage of knowledge representation, based on a basic ontology composed of 47 classes. The ontology, automatically extracted, generates richer domain specific knowledge. It propitiates, through its semantic net, the right conditions for the user to find with larger efficiency and agility the terms adapted for the consultation to the texts. A prototype of this system was built and used for the indexation of a collection of 221 electronic texts of Information Science written in Portuguese from Brazil. Instead of being based in statistical theories, we propose a robust Information Retrieval System that uses cognitive theories, allowing a larger efficiency in the answer to the users queries.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).