Refine
Year of publication
- 2008 (19) (remove)
Document Type
- Part of a Book (9)
- Article (4)
- Book (3)
- Conference Proceeding (2)
- Master's Thesis (1)
Has Fulltext
- yes (19)
Keywords
- Deutsch (7)
- Computerlinguistik (2)
- Mehrsprachigkeit (2)
- Rumänisch (2)
- Wörterbuch (2)
- Acquisition (1)
- Adelung, Johann Christoph (1)
- Bologna Process (1)
- Bologna-Prozess (1)
- Computerunterstützte Lexikografie (1)
Publicationstate
- Zweitveröffentlichung (19) (remove)
Reviewstate
Publisher
- Narr (3)
- iudicium (3)
- Dudenverlag (1)
- Duncker & Humboldt (1)
- European Language Resources Association (1)
- Gesellschaft für Linguistische Datenverarbeitung (1)
- Lang (1)
- Peter Lang (1)
- Shaker (1)
- Springer (1)
In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with automatically assigning a text a tree structure in which discourse segments and rhetorical relations between them are marked, such as Concession. For identifying the combinable segments, declarative rules are employed, which describe linguistic and structural cues and constraints about possible combinations by referring to different XML annotation layers of the input text, and external knowledge bases such as a discourse marker lexicon, a lexico-semantic ontology (later to be combined with a domain ontology), and an ontology of rhetorical relations. In our text-technological environment, the obvious choice of formalism to represent such ontologies is OWL (Smith et al., 2004). In this paper, we describe two OWL ontologies and how they are consulted from the discourse parser to solve certain tasks within DP. The first ontology is a taxononomy of rhetorical relations which was developed in the project. The second one is an OWL version of GermaNet, the model of which we designed together with our project partners.
Weltansichten aus sprachlicher und rechtlicher Perspektive. Zur Ontisierung von Konzepten des Rechts
(2008)
Vorwort
(2008)
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.