Refine
Year of publication
- 2008 (34) (remove)
Document Type
- Conference Proceeding (34) (remove)
Is part of the Bibliography
- no (34)
Keywords
- Deutsch (8)
- Korpus <Linguistik> (7)
- Annotation (4)
- Automatische Sprachanalyse (4)
- Computerlinguistik (3)
- Information Extraction (3)
- Langzeitarchivierung (3)
- Metadaten (3)
- Computerunterstützte Lexikographie (2)
- Datensatz (2)
Publicationstate
- Veröffentlichungsversion (16)
- Postprint (2)
- Zweitveröffentlichung (2)
Reviewstate
- Peer-Review (9)
- (Verlags)-Lektorat (7)
- Verlags-Lektorat (1)
Publisher
- European Language Resources Association (ELRA) (6)
- ELRA (3)
- University of Oulu (3)
- European Language Resources Association (2)
- Berlin-Brandenburgische Akademie der Wissenschaften (1)
- CSLI (1)
- EURALEX (1)
- INRIA (1)
- Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra (1)
- Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra: (1)
This paper presents three electronic collections of polarity items: (i) negative polarity items in Romanian, (ii) negative polarity items in German, and (iii) positive polarity items in German. The presented collections are a part of a linguistic resource on lexical units with highly idiosyncratic occurrence patterns. The motivation for collecting and documenting polarity items was to provide a solid empirical basis for linguistic investigations of these expressions. Our databe provides general information about the collected items, specifies their syntactic properties, and describes the environment that licenses a given item. For each licensing context, examples from various corpora and the Internet are introduced. Finally, the type of polarity (negative or positive) and the class (superstrong, strong, weak or open) associated with a given item is speci ed. Our database is encoded in XML and is available via the Internet, offering dynamic and exible access.
The authors present a multilingual electronic database of lexical items with idiosyncratic occurrence patterns. Currently, our database consists of: (1) a collection of 444 bound words in German; (2) a collection of 77 bound words in English; (3) a collection of 58 negative polarity items in Romanian; (4) a collection of 84 negative polarity items in German; and (5) a collection of 52 positive polarity items in German. The database is encoded in XML and is available via the Internet, offering dynamic and flexible access.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
In this paper the authors briefly outline editing functions which use methods from computational linguistics and take the structures of natural languages into consideration. Such functions could reduce errors and better support writers in realizing their communicative goals. However, linguistic methods have limits, and there are various aspects software developers have to take into account to avoid creating a solution looking for a problem: Language-aware functions could be powerful tools for writers, but writers must not be forced to adapt to their tools.
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
The authors describe two data sets submitted to the database of MWE evaluation resources: (1) cranberry expressions in English and (2) cranberry expressions in German. The first package contains a collection of 444 cranberry words in German (CWde.txt) and a collection of the corresponding cranberry expressions (CCde.txt). The second package consists of a collection of 77 cranberry words in English (CWen.txt) and a collection of the corresponding cranberry expressions (CCen.txt). The data included in these packages was extracted from the Collection of Distributionally Idiosyncratic Items (CoDII), an electronic linguistic resource of lexical items with idiosyncratic occurrence patterns. Each package contains a readme file, and can be downloaded from multiword.wiki.sourceforge.net/Resources.
This work proposes opinion frames as a representation of discourse-level associations which arise from related opinion topics. We illustrate how opinion frames help gather more information and also assist disambiguation. Finally we present the results of our experiments to detect these associations.
This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.
Diskurswörterbuch
(2008)
After a brief discussion on the term discourse, discourse will be related to the tasks o f a discourse dictionary. The paper goes on developing the subject of discourse lexicography, which is a lexicographic presentation of discourse vocabulary, of the net of its semantic relations, and of the societal and historical circumstances of the usage people have made of it. This background will be useful for the presentation of two types of discourse dictionaries. On the one hand, they are based on the same primary conception. On the other hand, they are adapted to the respective discourse constellations, The first example is the result of a project on the early post-war period and presents the already-existing discourse dictionary of this project. The content of this dictionary is the vocabulary of three different groups, which participate in one discourse and specifically represent its main item. Since this dictionary also exists in electronic version, this concept will be proved by examples taken out of this version. The second example refers to a project running on the 1967/68 protest period. The vocabulary of this discourse makes up a set of several single discourse items, while these items constitute the leading subject of the discourse of 1967/68: democracy. Thus, the task of the lexicographic description o f a complex discourse like this is not at least: to assign the discourse vocabulary to the single discourses and to describe the different usages relating to these single discourses. The paper ends with a draft o f a lexicographic program based on the type discourse dictionary