Refine
Year of publication
- 2008 (21) (remove)
Document Type
- Article (12)
- Conference Proceeding (9)
Has Fulltext
- yes (21)
Is part of the Bibliography
- no (21)
Keywords
- Deutsch (7)
- Französisch (3)
- Information Extraction (3)
- Automatische Sprachanalyse (2)
- Computerunterstützte Lexikografie (2)
- Datensatz (2)
- Diskurs (2)
- Information Retrieval (2)
- Interaktion (2)
- Konversationsanalyse (2)
Publicationstate
- Veröffentlichungsversion (14)
- Zweitveröffentlichung (4)
- Postprint (3)
Reviewstate
- Peer-Review (21) (remove)
Publisher
- European Language Resources Association (ELRA) (2)
- de Gruyter (2)
- European Language Resources Association (1)
- Gesellschaft für deutsche Sprache (GfdS) (1)
- Institut de Linguistique Française (1)
- Institut für Deutsche Sprache (1)
- International Speech Communication Association (1)
- Pabst (1)
- Presses universitaires de la Méditerranée (1)
- Schmidt (1)
Although there is a growing interest of policy makers in higher education issues (especially on an international scale), there is still a lack of theoretically well-grounded comparative analyses of higher education policy. Even broadly discussed topics in higher education research like the potential convergence of European higher education systems in the course of the Bologna Process suffer from a thin empirical and comparative basis. This paper aims to deal with these problems by addressing theoretical questions concerning the domestic impact of the Bologna Process and the role national factors play in determining its effects on cross-national policy convergence. It develops a distinct theoretical approach for the systematic and comparative analysis of cross-national policy convergence. In doing so, it relies upon insights from related research areas — namely literature on Europeanization as well as studies dealing with cross-national policy convergence.
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
Im Deutschen und anderen europäischen Sprachen können Demonstrativa das Antezedens von Relativsätzen bilden oder als Determinator eines solchen Antezedens fungieren. Konstruktionen dieser Art weisen Besonderheiten in Bezug auf Form und Bedeutung auf: Einerseits finden sich Demonstrativa, die nicht oder nur marginal mit appositiven Relativsätzen kombiniert werden können, andererseits solche, die entweder keine restriktiven Relativsätze zulassen oder sich mit diesen nur in speziellen, nichtdeiktischen und nichtphorischen Bedeutungen kombinieren lassen. Zumindest einige dieser Besonderheiten scheinen auf allgemeinere, sprachübergreifende Beschränkungen hinzuweisen. So zeigt sich tendenziell, dass die Kombinierbarkeit von Demonstrativa mit restriktiven Relativsätzen mit der deiktischen Stärke des Demonstrativums korreliert: Distanzmarkierende und in diesem Sinn deiktisch starke Demonstrativa schließen restriktive Relativsätze tendenziell aus, während distanzneutrale oder nichtdeiktisch verwendbare Demonstrativa sie in der Regel zulassen. Beschränkungen dieser Art werden anhand des Deutschen, Französischen und Schwedischen aufgezeigt.
In der „Nacherstposition“ zwischen einer Vorfeldkonstituente und dem Finitum können im Deutschen bestimmte unflektierbare Einheiten (wie allerdings, wiederum, also, nun, nämlich, beispielsweise) auftreten, die sich anders als Fokuspartikeln verhalten. Es handelt sich um Adverbkonnektoren, die neben ihrer relationierenden Funktion in dieser - und nur in dieser - Position die informationsstrukturelle Aufgabe der Topikwechselmarkierung übernehmen. Nur eine kleine Klasse skalierender Einheiten - die typischen Stiefkinder der Fokuspartikelforschung {zumindest, höchstens, wenigstens u. a.) - kann hier alternativ Topik und Fokus markieren. Mit ihrer spezifischen Form-Funktions-Korrelation stellt die Nacherstposition von Adverbkonnektoren eine nicht zur Gänze kompositional erschließbare „Konstruktion“ im konstruktionsgrammatischen Sinne dar.
In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA components is beneficial for configuration management, component reuse, implementation costs, analysis and visualization.
Current Natural Language Processing (NLP) systems feature high-complexity processing pipelines that require the use of components at different levels of linguistic and application specific processing. These components often have to interface with external e.g. machine learning and information retrieval libraries as well as tools for human annotation and visualization. At the UKP Lab, we are working on the Darmstadt Knowledge Processing Software Repository (DKPro) (Gurevych et al., 2007a; Müller et al., 2008) to create a highly flexible, scalable and easy-to-use toolkit that allows rapid creation of complex NLP pipelines for semantic information processing on demand. The DKPro repository consists of several main parts created to serve the purposes of different NLP application areas
Introduction
(2008)
In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.