Refine
Year of publication
Document Type
- Conference Proceeding (11)
- Part of a Book (8)
- Article (3)
Language
- English (22) (remove)
Has Fulltext
- yes (22)
Keywords
- Diskursanalyse (5)
- Korpus <Linguistik> (5)
- Ontologie <Wissensverarbeitung> (4)
- Digital Humanities (3)
- Parser (3)
- Annotation (2)
- Computerlinguistik (2)
- Computerunterstützte Kommunikation (2)
- Deutsch (2)
- Gefangenenliteratur (2)
Publicationstate
- Zweitveröffentlichung (12)
- Postprint (5)
- Veröffentlichungsversion (4)
Reviewstate
- (Verlags)-Lektorat (15)
- Peer-Review (3)
Publisher
- Springer (3)
- ICCC Press (2)
- de Gruyter (2)
- ACM (1)
- Benjamins (1)
- EPFL/UNIL (1)
- Foi-Commerce (1)
- Foris (1)
- Gesellschaft für Linguistische Datenverarbeitung (1)
- Institut für Kognitionswissenschaft Universität Osnabrück (1)
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
This paper presents challenges and opportunities resulting from the application of geographical information systems (GIS) in the (digital) humanities. First, we provide an overview of the intersection and interaction between geography (and cartography), and the humanities. Second, the “GeoBib” project is used as a case study to exemplify challenges for such collaborative, interdisciplinary projects, both for the humanists and the geoscientists. Finally, we conclude with an outlook on further applications of GIS in the humanities, and the potential scientific benefit for both sides, humanities and geosciences.
From Open Source to Open Information. Collaborative Methods in Creating XML-based Markup Languages
(2000)
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.
Knowledge in textual form is always presented as visually and hierarchically structured units of text, which is particularly true in the case of academic texts. One research hypothesis of the ongoing project Knowledge ordering in texts - text structure and structure visualisations as sources of natural ontologies1 is that the textual structure of academic texts effectively mirrors essential parts of the knowledge structure that is built up in the text. The structuring of a modern dissertation thesis (e.g. in the form of an automatically generated table of contents - toes), for example, represents a compromise between requirements of the text type and the methodological and conceptual structure of its subject-matter. The aim of the project is to examine how visual-hierarchical structuring systems are constructed, how knowledge structures are encoded in them, and how they can be exploited to automatically derive ontological knowledge for navigation, archiving, or search tasks. The idea to extract domain concepts and semantic relations mainly from the structural and linguistic information gathered from tables of contents represents a novel approach to ontology learning.
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.
A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure Theory. The architecture of the parser comprises pre-processing components which provide an input text with XML annotations on different linguistic and structural layers. In the first version these are syntactic tagging, lexical discourse marker tagging, logical document structure, and segmentation into elementary discourse segments. The algorithm is based on the shift-reduce parser by Marcu (2000) and is controlled by reduce operations that are constrained by linguistic conditions derived from an XML-encoded discourse marker lexicon. The constraints are formulated over multiple annotation layers of the same text.