NEALT Proceedings Series
Northern European Association for Language Technology
Refine
Document Type
Language
- English (8)
Has Fulltext
- yes (8)
Keywords
- Computerlinguistik (6)
- Natürliche Sprache (4)
- Fremdsprachenlernen (3)
- Korpus <Linguistik> (3)
- Maschinelles Lernen (3)
- Annotation (2)
- Automatische Sprachanalyse (2)
- Computerunterstütztes Verfahren (2)
- Sentimentanalyse (2)
- Text Mining (2)
Publicationstate
Reviewstate
- Peer-Review (6)
11
In this paper, we explore different linguistic structures encoded as convolution kernels for the detection of subjective expressions. The advantage of convolution kernels is that complex structures can be directly provided to a classifier without deriving explicit features. The feature design for the detection of subjective expressions is fairly difficult and there currently exists no commonly accepted feature set. We consider various structures, such as constituency parse structures, dependency parse structures, and predicate-argument structures. In order to generalize from lexical information, we additionally augment these structures with clustering information and the task-specific knowledge of subjective words. The convolution kernels will be compared with a standard vector kernel.
36
We present a language learning application that relies on grammars to model the learning outcome. Based on this concept we can provide a powerful framework for language learning exercises with an intuitive user interface and a high reliability. Currently the application aims to augment existing language classes and support students by improving the learner attitude and the general learning outcome. Extensions beyond that scope are promising and likely to be added in the future.
28
To optimize the sharing and reuse of existing data, many funding organizations now require researchers to specify a management plan for research data. In such a plan, researchers are supposed to describe the entire life cycle of the research data they are going to produce, from data creation to formatting, interpretation, documentation, short-term storage, long-term archiving and data re-use. To support researchers with this task, we built DMPTY, a wizard that guides researchers through the essential aspects of managing data, elicits information from them, and finally, generates a document that can be further edited and linked to the original research proposal.
47
Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.
4
In opinion mining, there has been only very little work investigating semi-supervised machine learning on document-level polarity classification. We show that semi-supervised learning performs significantly better than supervised learning when only few labelled data are available. Semi-supervised polarity classifiers rely on a predictive feature set. (Semi-)Manually built polarity lexicons are one option but they are expensive to obtain and do not necessarily work in an unknown domain. We show that extracting frequently occurring adjectives & adverbs of an unlabeled set of in-domain documents is an inexpensive alternative which works equally well throughout different domains.
39
Content
1 Predicting learner knowledge of individual words using machine learning
Drilon Avdiu, Vanessa Bui, Klára Ptacinová Klimci´ková
2 Automatic Generation and Semantic Grading of Esperanto Sentences in a Teaching Context
Eckhard Bick
3 Toward automatic improvement of language produced by non-native language learners
Mathias Creutz, Eetu Sjöblom
4 Linguistic features and proficiency classification in L2 Spanish and L2 Portuguese
Iria del Ri´o
5 Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education
Sabrina Dittrich, Zarah Weiss, Hannes Schröter, Detmar Meurers
6 Formalism for a language agnostic language learning game and productive grid generation
Sylvain Hatier, Arnaud Bey, Mathieu Loiseau
7 Understanding Vocabulary Growth Through An Adaptive Language Learning System
Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger,Yu Qiao, Christian Kohlschein, Tobias Meisen
8 Summarization Evaluation meets Short-Answer Grading
Margot Mieskes, Ulrike Padó
9 Experiments on Non-native Speech Assessment and its Consistency
Ziwei Zhou, Sowmya Vajjala, Seyed Vahid Mirnezami
10 The Impact of Spelling Correction and Task Context on Short Answer Assessment for Intelligent Tutoring Systems
Ramon Ziai, Florian Nuxoll, Kordula De Kuthy, Björn Rudzewitz, Detmar Meurers
1
This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.