OPUS 4 | Search

XML-Dokumentgrammatiken, die als DTDs oder neuerdings als XML-Schemata spezifiziert werden, spezifizieren zwar die syntaktischen Eigenschaften einer Klasse von Dokumenten, für sie existiert aber normalerweise kein formales semantisches Modell des Gegenstandsbereichs, auf das Dokumentstrukturen abgebildet werden können. Der Beitrag zeigt am Beispiel der Tabelle, wie semantische Netze für diese Aufgabe herangezogen werden können. Die konkrete Umsetzung geschieht dabei auf der Grundlage des Topic-Map-Standards in Verbindung mit XPath-Ausdrücken, die aus dem semantischen Netz in die Dokumentinstanz bzw. in ein XML-Schema verweisen.

Introduction: Modeling, Learning and Processing of Text-Technological Data Structures (2011)

Mehler, Alexander ; Kühnberger, Kai-Uwe ; Lobin, Henning ; Lüngen, Harald ; Storrer, Angelika ; Witt, Andreas

Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.

Processing Text-Technological Resources in Discourse Parsing (2011)

Lobin, Henning ; Lüngen, Harald ; Hilbert, Mirco ; Bärenfänger, Maja

Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific joumal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed Systems with a similar task.

Präsentation, Transformation und Analyse: Verarbeitung XML-basierter japanischer Dialoge (2001)

Sasaki, Felix ; Witt, Andreas

In diesem Beitrag wird die Verwendung von XML zur Annotation, Präsentation, Transformation und Analyse von japanischen Dialogdaten thematisiert. Für die unterschiedlichen Zweckbereiche werden unterschiedliche texttechnologische Standards beschrieben und implementiert.

Lexikonexpansion: Vom XML-annotiertem Stammformenlexikon zum Vollformenlexikon (2001)

Pönninghaus, Jens ; Witt, Andreas

Im Folgenden wird eine texttechnologische Komponente zur Expansion eines XML- annotierten Stammformenlexikons, das auf Einträgen eines Standardwörterbuchs basiert, vorgestellt. Diese Expansion wurde in der Document Style Semantics and Specification Language implementiert. Ihr Ergebnis ist ein Vollformenlexikon, das ebenfalls in XML repräsentiert ist.

Compilation and Annotation of the Discourse-structured Blog Corpus for German (2016)

Grumt Suárez, Holger ; Karlova-Bourbonus, Natali ; Lobin, Henning

The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g., title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).

A Discourse-structured Blog Corpus for German: Challenges of Compilation and Annotation (2016)

Suarez, Holger Grumt ; Karlova-Bourbonus, Natali ; Lobin, Henning

The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).

Lesen, Schreiben, Erzählen. Kommunikative Kulturtechniken im digitalen Zeitalter (2013)

Die Digitalisierung der Medien und die Vernetzung der Kommunikation prägen maßgeblich unseren Alltag: Sie beeinflussen die kommunikativen Kulturtechniken des Lesens, Schreibens und Erzählens. Die Autorinnen und Autoren widmen sich diesem Phänomen aus dem Blickwinkel der Sprach-, Literatur-, Geschichts- sowie Politikwissenschaft und analysieren seine Bedeutung für die Produktion, Organisation und Rezeption unseres kulturellen Wissens.

Texttechnologie - eine neue Perspektive der Computerlinguistik (2002)

Lobin, Henning

Aktuelle und künftige technische Rahmenbedingungen digitaler Medien für die Wissenschaftskommunikation (2017)

Lobin, Henning

Der vorliegende Artikel untersucht die Frage, wie sich die Angebote im Bereich von Social Media heute darstellen und wie sie sich in den nächsten Jahren voraussichtlich entwickeln werden. Der Fokus liegt dabei auf der Entwicklung der technischen Infrastruktur und deren Einfluss auf die verschiedenen Aspekte wissenschaftlicher Kommunikation. Einen Schwerpunkt bilden dabei einerseits die Auswirkungen der Automatisierung, im Bereich der Wissenschaftskommunikation die Entwicklung von spezifischen Scores und Altmetriken, andererseits die Etablierung neuartiger Vermittlungskanäle für wissenschaftliche Themen.

Textsorte "Wissenschaftliche Präsentation". Textlinguistische Bemerkungen zu einer komplexen Kommunikationsform (2007)

Lobin, Henning

Person(s)
Title
Subject
Abstract
Fulltext
Year(s)

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

172 search hits