Refine
Year of publication
- 2008 (62) (remove)
Document Type
- Conference Proceeding (29)
- Part of a Book (16)
- Article (13)
- Doctoral Thesis (3)
- Working Paper (1)
Language
- English (62) (remove)
Is part of the Bibliography
- no (62)
Keywords
- Deutsch (15)
- Korpus <Linguistik> (9)
- Annotation (5)
- Automatische Sprachanalyse (4)
- Computerlinguistik (4)
- Computerunterstützte Lexikographie (4)
- Englisch (4)
- Gesprochene Sprache (4)
- Mehrsprachigkeit (4)
- Computerunterstützte Lexikografie (3)
Publicationstate
- Veröffentlichungsversion (33)
- Postprint (7)
- Zweitveröffentlichung (4)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (7)
- de Gruyter (4)
- ELRA (3)
- University of Oulu (3)
- Academia (2)
- Benjamins (2)
- European Language Resources Association (2)
- Aisthesis (1)
- BBAW (1)
- CSLI (1)
This paper presents three electronic collections of polarity items: (i) negative polarity items in Romanian, (ii) negative polarity items in German, and (iii) positive polarity items in German. The presented collections are a part of a linguistic resource on lexical units with highly idiosyncratic occurrence patterns. The motivation for collecting and documenting polarity items was to provide a solid empirical basis for linguistic investigations of these expressions. Our databe provides general information about the collected items, specifies their syntactic properties, and describes the environment that licenses a given item. For each licensing context, examples from various corpora and the Internet are introduced. Finally, the type of polarity (negative or positive) and the class (superstrong, strong, weak or open) associated with a given item is speci ed. Our database is encoded in XML and is available via the Internet, offering dynamic and exible access.
The authors present a multilingual electronic database of lexical items with idiosyncratic occurrence patterns. Currently, our database consists of: (1) a collection of 444 bound words in German; (2) a collection of 77 bound words in English; (3) a collection of 58 negative polarity items in Romanian; (4) a collection of 84 negative polarity items in German; and (5) a collection of 52 positive polarity items in German. The database is encoded in XML and is available via the Internet, offering dynamic and flexible access.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
We present SPLICR, the Web-based Sustainability Platform for Linguistic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one’s spe cific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collec- tions of language resources.
This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation and analysis of multimodality. Each of the tools has specific strengths so that a variety of differ-ent tools, working on the same data, can be desirable for project work. However this usually re-quires tedious conversion between formats. We propose a common exchange format for multi-modal annotation, based on the annotation graph (AG) formalism, which is supported by import and export routines in the respective tools. In the current version of this format the common de-nominator information can be reliably exchanged between the tools, and additional information can be stored in a standardized way.
Lexical chaining has become an important part of many NLP tasks. However, the goodness of a chaining process and hence its annotation output depends on the quality of the chaining resource. Therefore, a framework for chaining is needed which integrates divergent resources in order to balance their deficits and to compare their strengths and weaknesses. In this paper we present an application that incorporates the framework of a meta model of lexical chaining exemplified on three resources and its generalized exchange format.
Although there is a growing interest of policy makers in higher education issues (especially on an international scale), there is still a lack of theoretically well-grounded comparative analyses of higher education policy. Even broadly discussed topics in higher education research like the potential convergence of European higher education systems in the course of the Bologna Process suffer from a thin empirical and comparative basis. This paper aims to deal with these problems by addressing theoretical questions concerning the domestic impact of the Bologna Process and the role national factors play in determining its effects on cross-national policy convergence. It develops a distinct theoretical approach for the systematic and comparative analysis of cross-national policy convergence. In doing so, it relies upon insights from related research areas — namely literature on Europeanization as well as studies dealing with cross-national policy convergence.
In this paper the authors briefly outline editing functions which use methods from computational linguistics and take the structures of natural languages into consideration. Such functions could reduce errors and better support writers in realizing their communicative goals. However, linguistic methods have limits, and there are various aspects software developers have to take into account to avoid creating a solution looking for a problem: Language-aware functions could be powerful tools for writers, but writers must not be forced to adapt to their tools.
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
The authors describe two data sets submitted to the database of MWE evaluation resources: (1) cranberry expressions in English and (2) cranberry expressions in German. The first package contains a collection of 444 cranberry words in German (CWde.txt) and a collection of the corresponding cranberry expressions (CCde.txt). The second package consists of a collection of 77 cranberry words in English (CWen.txt) and a collection of the corresponding cranberry expressions (CCen.txt). The data included in these packages was extracted from the Collection of Distributionally Idiosyncratic Items (CoDII), an electronic linguistic resource of lexical items with idiosyncratic occurrence patterns. Each package contains a readme file, and can be downloaded from multiword.wiki.sourceforge.net/Resources.
Digital Text Collections, Linguistic Research Data, and Mashups: Notes on the Legal Situation
(2008)
Comprehensive data repositories are an essential part of practically all research carried out in the digital humanities nowadays. For example, library science, literary studies, and computational and corpus linguistics strongly depend on online archives that are highly sustainable and that contain not only digitized texts but also audio and video data as well as additional information such as metadata and arbitrary annotations. Current Web technologies, especially those that are related to what is commonly referred to as the Web 2.0, provide a number of novel functions such as multiuser editing or the inclusion of third-party content and applications that are also highly attractive for research applications in the areas mentioned above. Hand in hand with this development goes a high degree of legal uncertainty. The special nature of the data entails that, in quite a few cases, there are multiple holders of personal rights (mostly copyright) to different layers of data that often have different origins. This article discusses the legal problems of multiple authorships in private, commercial, and research environments. We also introduce significant differences between European and U.S. law with regard to the handling of this kind of data for scientific purposes.
This work proposes opinion frames as a representation of discourse-level associations which arise from related opinion topics. We illustrate how opinion frames help gather more information and also assist disambiguation. Finally we present the results of our experiments to detect these associations.
This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.
Discourse metaphors
(2008)
The article introduces the notion of discourse metaphor, relatively stable metaphorical mappings that function as a key framing device within a particular discourse over a certain period of time. Discourse metaphors are illustrated by case studies from three lines of research: on the cultural imprint of metaphors, on the negotiation of metaphors and on cross-linguistic occurrence. The source concepts of discourse metaphors refer to phenomenologically salient real or fictitious objects that are part of interactional space (i.e., can be pointed at, like MACHINES or HOUSES) and/or occupy an important place in cultural imagination. Discourse metaphors change both over time and across the discourses where they are used. The implications of focussing on different types of source domains for our thinking about the embodiment and sociocultural situatedness of metaphor is discussed, with particular reference to recent developments in Conceptual Metaphor Theory. Research on discourse suggests that situatedness is a crucial factor in the functioning and dynamics of metaphor.
E-VALBU: Advanced SQL/XML processing of dictionary data using an object-relational XML database
(2008)
Contemporary practical lexicography uses a wide range of advanced technological aids,most prominently database systems for the administration of dictionary content. Since XML has become a de facto standard for the coding of lexicographic articles, integrated markup functionality – such as query, update, or transformation of instances – is of particular importance. Even the multi-channel distribution of dictionary data benefits from powerful XML database services. Exemplified by E-VALBU, the most comprehensive electronic dictionary on German verb valency, we outline an integrated approach for advanced XML storing and processing within an object-relational database, and for a public retrieval frontend using Web Services and AJAX technology.
As many popular text genres such as blogs or news contain opinions by multiple sources and about multiple targets, finding the sources and targets of subjective expressions becomes an important sub-task for automatic opinion analysis systems. We argue that while automatic semantic role labeling systems (ASRL) have an important contribution to make, they cannot solve the problem for all cases. Based on the experience of manually annotating opinions, sources, and targets in various genres, we present linguistic phenomena that require knowledge beyond that of ASRL systems. In particular, we address issues relating to the attribution of opinions to sources; sources and targets that are realized as zero-forms; and inferred opinions. We also discuss in some depth that for arguing attitudes we need to be able to recover propositions and not only argued-about entities. A recurrent theme of the discussion is that close attention to specific discourse contexts is needed to identify sources and targets correctly.
This paper is a project report of the lexicographic Internet portal OWID, an Online Vocabulary Information System of German which is being built at the Institute of German Language in Mannheim (IDS). Overall, the contents of the portal and its technical approaches will be presented. The lexical database is structured in a granular way which allows to extend possible search options for lexicographers. Against the background of current research on using electronic dictionaries, the project OWID is also working on first ideas of useradapted access and user-adapted views of the lexicographic data. Due to the fact that the portal OWID comprises dictionaries which are available online it is possible to change the design and functions of the website easily (in comparison to printed dictionaries). Ideas of implementing user-adapted views of the lexicographic data will be demonstrated by using an example taken from one of the dictionaries of the portal, namely elexiko.