OPUS 4 | NEALT Proceedings Series

NEALT Proceedings Series

Northern European Association for Language Technology

8 search hits

1 to 8

Sort by

Convolution Kernels for Subjectivity Detection (2011)

In this paper, we explore different linguistic structures encoded as convolution kernels for the detection of subjective expressions. The advantage of convolution kernels is that complex structures can be directly provided to a classifier without deriving explicit features. The feature design for the detection of subjective expressions is fairly difficult and there currently exists no commonly accepted feature set. We consider various structures, such as constituency parse structures, dependency parse structures, and predicate-argument structures. In order to generalize from lexical information, we additionally augment these structures with clustering information and the task-specific knowledge of subjective words. The convolution kernels will be compared with a standard vector kernel.

Demonstrating the MUSTE language learning environment (2018)

Lange, Herbert ; Ljunglöf, Peter

We present a language learning application that relies on grammars to model the learning outcome. Based on this concept we can provide a powerful framework for language learning exercises with an intuitive user interface and a high reliability. Currently the application aims to augment existing language classes and support students by improving the learner attitude and the general learning outcome. Extensions beyond that scope are promising and likely to be added in the future.

DMPTY – A wizard for generating data management plans (2015)

Trippel, Thorsten ; Zinn, Claus

To optimize the sharing and reuse of existing data, many funding organizations now require researchers to specify a management plan for research data. In such a plan, researchers are supposed to describe the entire life cycle of the research data they are going to produce, from data creation to formatting, interpretation, documentation, short-term storage, long-term archiving and data re-use. To support researchers with this task, we built DMPTY, a wizard that guides researchers through the essential aspects of managing data, elicits information from them, and finally, generates a document that can be further edited and linked to the original research proposal.

Metadata formats for learner corpora: case study and discussion (2022)

Lange, Herbert

Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.

Predictive Features in Semi-Supervised Learning for Polarity Classification and the Role of Adjectives (2009)

Wiegand, Michael ; Klakow, Dietrich

In opinion mining, there has been only very little work investigating semi-supervised machine learning on document-level polarity classification. We show that semi-supervised learning performs significantly better than supervised learning when only few labelled data are available. Semi-supervised polarity classifiers rely on a predictive feature set. (Semi-)Manually built polarity lexicons are one option but they are expensive to obtain and do not necessarily work in an unknown domain. We show that extracting frequently occurring adjectives & adverbs of an unlabeled set of in-domain documents is an inexpensive alternative which works equally well throughout different domains.

Preface (2019)

Alfter, David ; Volodina, Elena ; Borin, Lars ; Pilán, Ildikó ; Lange, Herbert

Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku, Finland (2019)

Content 1 Predicting learner knowledge of individual words using machine learning Drilon Avdiu, Vanessa Bui, Klára Ptacinová Klimci´ková 2 Automatic Generation and Semantic Grading of Esperanto Sentences in a Teaching Context Eckhard Bick 3 Toward automatic improvement of language produced by non-native language learners Mathias Creutz, Eetu Sjöblom 4 Linguistic features and proficiency classification in L2 Spanish and L2 Portuguese Iria del Ri´o 5 Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education Sabrina Dittrich, Zarah Weiss, Hannes Schröter, Detmar Meurers 6 Formalism for a language agnostic language learning game and productive grid generation Sylvain Hatier, Arnaud Bey, Mathieu Loiseau 7 Understanding Vocabulary Growth Through An Adaptive Language Learning System Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger,Yu Qiao, Christian Kohlschein, Tobias Meisen 8 Summarization Evaluation meets Short-Answer Grading Margot Mieskes, Ulrike Padó 9 Experiments on Non-native Speech Assessment and its Consistency Ziwei Zhou, Sowmya Vajjala, Seyed Vahid Mirnezami 10 The Impact of Spelling Correction and Task Context on Short Answer Assessment for Intelligent Tutoring Systems Ramon Ziai, Florian Nuxoll, Kordula De Kuthy, Björn Rudzewitz, Detmar Meurers

Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited (2007)

Rehbein, Ines ; van Genabith, Josef

This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.

1 to 8

Open Access

NEALT Proceedings Series

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

8 search hits