Refine
Year of publication
- 2009 (9) (remove)
Document Type
- Conference Proceeding (4)
- Article (2)
- Part of a Book (2)
- Contribution to a Periodical (1)
Has Fulltext
- yes (9)
Is part of the Bibliography
- no (9)
Keywords
- Computerlinguistik (9) (remove)
Publicationstate
- Veröffentlichungsversion (4)
- Postprint (2)
- Zweitveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (3)
- Peer-Review (2)
- Peer-Revied (1)
- Verlags-Lektorat (1)
Publisher
- AAAI Press (1)
- Europ. Akad. (1)
- Narr (1)
- Northern European Association for Language Technology (1)
- Schöningh (1)
- Springer (1)
- Verl.-Haus. Monsenstein und Vannerdat (1)
Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower classification accuracy.
In this paper, we show that performance can be improved by adding features specifically designed for sentence-level polarity classification. We consider both explicit polarity information and various linguistic features. A great proportion of the improvement that can be obtained by using polarity information can also be achieved by using a set of simple domain-independent linguistic features.
We present a morphological analyzer for Spanish called SMM. SMM is implemented in the grammar development framework Malaga, which is based on the formalism of Left-Associative Grammar. We briefly present the Malaga framework, describe the implementation decisions for some interesting morphological phenomena of Spanish, and report on the evaluation results from the analysis of corpora. SMM was originally only designed for analyzing word forms; in this article we outline two approaches for using SMM and the facilities provided by Malaga to also generate verbal paradigms. SMM can also be embedded into applications by making use of the Malagaprogramming interface; we briefly discuss some application scenarios.
In opinion mining, there has been only very little work investigating semi-supervised machine learning on document-level polarity classification. We show that semi-supervised learning performs significantly better than supervised learning when only few labelled data are available. Semi-supervised polarity classifiers rely on a predictive feature set. (Semi-)Manually built polarity lexicons are one option but they are expensive to obtain and do not necessarily work in an unknown domain. We show that extracting frequently occurring adjectives & adverbs of an unlabeled set of in-domain documents is an inexpensive alternative which works equally well throughout different domains.
This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
From Proof Texts to Logic. Discourse Representation Structures for Proof Texts in Mathematics
(2009)
We present an extension to Discourse Representation Theory that can be used to analyze mathematical texts written in the commonly used semi-formal language of mathematics (or at least a subset of it). Moreover, we describe an algorithm that can be used to check the resulting Proof Representation Structures for their logical validity and adequacy as a proof.
This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken
language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the systems architecture and gives an overview of its most important software components a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA.
Spoken language corpora— as used in conversation analytic research, language acquisition studies and dialectology— pose a number of challenges that are rarely addressed by corpus linguistic methodology and technology. This paper starts by giving an overview of the most important methodological issues distinguishing spoken language corpus workfrom the work with written data. It then shows what technological challenges these methodological issues entail and demonstrates how they are dealt with in the architecture and tools of the EXMARaLDA system.
In this paper we present an evaluation of rule-based morphological components for German for use in an interactive editing environment. The criteria for the evaluation are deduced from the intended use of these components, namely availability, performance, programming interfaces, and analysis quality. We evaluated systems developed and maintained since decades as well as new systems. However, we note serious general shortcomings when looking closer at recent implementations and come to the conclusion that the oldest system is the only one that satisfies our requirements.