Refine
Document Type
- Part of a Book (2)
- Article (1)
- Preprint (1)
- Working Paper (1)
Language
- English (5) (remove)
Has Fulltext
- yes (5)
Keywords
- Autocorrelated errors (1)
- BNC (1)
- COHA (1)
- Computerlinguistik (1)
- Deutsch (1)
- Englisch (1)
- Google Books Ngram corpora (1)
- Grammatik (1)
- Hypertext (1)
- Informationsmanagement (1)
Publicationstate
- Preprint (5) (remove)
Reviewstate
- Peer-Review (1)
The concept of text coherence was developed for linear text, i.e. text of sequentially organized content. The present article addresses to what extent this concept can be applied to hypertext. Following the introduction (section 1), I will define different aspects of text coherence (section 2). I will then explain the importance of the sequential order of text constituents for coherence-building, as explored by empirical studies on text comprehension (section 3). Section 4 discusses how hypertext-specific forms of reading affect the processes of coherence-building and coherence-design. Section 5 explores how the new challenges of hypertext comprehension may be met by hypertext-specific coherence cues. A summary and outlook is included (section 6).
Grammis is a web-based information system on German grammar, hosted by the Institute for the German Language (IDS). It is human-oriented and features different theoretical perspectives on grammar. Currently, the terminology component of grammis is being redesigned for this theoretical diversity to play a more prominent role in the data model. This also opens opportunities for implementing some machine-oriented features. In this paper, we present the re-design of both data model and knowledge base. We explore how the addition of machine-oriented features to the data model impacts the knowledge base; in particular, how this addition shifts some of the textual complexity into the data model. We show that our resource can easily be ported to a SKOS-XL representation, which makes it available for data science, knowledge-based NLP applications, and LOD in the context of digital humanities.
In this contribution, we discuss and compare alternative options of modelling the entities and relations of wordnet-like resources in the Web Ontology Language OWL. Based on different modelling options, we developed three models of representing wordnets in OWL, i.e. the instance model, the dass model, and the metaclass model. These OWL models mainly differ with respect to the ontological Status of lexical units (word senses) and the synsets. While in the instance model lexical units and synsets are represented as individuals, in the dass model they are represented as classes; both model types can be encoded in the dialect OWL DL. As a third alternative, we developed a metaclass model in OWL FULL, in which lexical units and synsets are defined as metaclasses, the individuals of which are classes themselves. We apply the three OWL models to each of three wordnet-style resources: (1) a subset of the German wordnet GermaNet, (2) the wordnet-style domain ontology TermNet, and (3) GermaTermNet, in which TermNet technical terms and GermaNet synsets are connected by means of a set of “plug-in” relations. We report on the results of several experiments in which we evaluated the performance of querying and processing these different models: (1) A comparison of all three OWL models (dass, instance, and metaclass model) of TermNet in the context of automatic text-to-hypertext conversion, (2) an investigation of the potential of the GermaTermNet resource by the example of a wordnet-based semantic relatedness calculation.
In this paper, a method for measuring synchronic corpus (dis-)similarity put forward by Kilgarriff (2001) is adapted and extended to identify trends and correlated changes in diachronic text data, using the Corpus of Historical American English (Davies 2010a) and the Google Ngram Corpora (Michel et al. 2010a). This paper shows that this fully data-driven method, which extracts word types that have undergone the most pronounced change in frequency in a given period of time, is computationally very cheap and that it allows interpretations of diachronic trends that are both intuitively plausible and motivated from the perspective of information theory. Furthermore, it demonstrates that the method is able to identify correlated linguistic changes and diachronic shifts that can be linked to historical events. Finally, it can help to improve diachronic POS tagging and complement existing NLP approaches. This indicates that the approach can facilitate an improved understanding of diachronic processes in language change.
Frimer et al. (2015) claim that there is a linear relationship between the level of prosocial language and the level of public disapproval of US Congress. A re-analysis demonstrates that this relationship is the result of a misspecified model that does not account for first-order autocorrelated disturbances. A Stata script to reproduce all presented results is available as an appendix.