OPUS 4 | Search

Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku, Finland (2019)

Content 1 Predicting learner knowledge of individual words using machine learning Drilon Avdiu, Vanessa Bui, Klára Ptacinová Klimci´ková 2 Automatic Generation and Semantic Grading of Esperanto Sentences in a Teaching Context Eckhard Bick 3 Toward automatic improvement of language produced by non-native language learners Mathias Creutz, Eetu Sjöblom 4 Linguistic features and proficiency classification in L2 Spanish and L2 Portuguese Iria del Ri´o 5 Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education Sabrina Dittrich, Zarah Weiss, Hannes Schröter, Detmar Meurers 6 Formalism for a language agnostic language learning game and productive grid generation Sylvain Hatier, Arnaud Bey, Mathieu Loiseau 7 Understanding Vocabulary Growth Through An Adaptive Language Learning System Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger,Yu Qiao, Christian Kohlschein, Tobias Meisen 8 Summarization Evaluation meets Short-Answer Grading Margot Mieskes, Ulrike Padó 9 Experiments on Non-native Speech Assessment and its Consistency Ziwei Zhou, Sowmya Vajjala, Seyed Vahid Mirnezami 10 The Impact of Spelling Correction and Task Context on Short Answer Assessment for Intelligent Tutoring Systems Ramon Ziai, Florian Nuxoll, Kordula De Kuthy, Björn Rudzewitz, Detmar Meurers

Proceedings of the 9th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2020) (2020)

Content 1 Substituto - A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing Marianne Grace Araneta, Gülsen Eryigit, Alexander König, Ji-Ung Lee, Ana Luís, Verena Lyding, Lionel Nicolas, Christos Rodosthenous and Federico Sangati 2 The Teacher-Student Chatroom Corpus Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne and Paula Buttery 3 Polygloss - A conversational agent for language practice Etiene da Cruz Dalcol and Massimo Poesio 4 Show, Don’t Tell: Visualising Finnish Word Formation in a Browser-Based Reading Assistant Frankie Robertson

MULLE: A grammar-based Latin language learning tool to supplement the classroom setting (2018)

Lange, Herbert ; Ljunglöf, Peter

MULLE is a tool for language learning that focuses on teaching Latin as a foreign language. It is aimed for easy integration into the traditional classroom setting and syllabus, which makes it distinct from other language learning tools that provide standalone learning experience. It uses grammar-based lessons and embraces methods of gamification to improve the learner motivation. The main type of exercise provided by our application is to practice translation, but it is also possible to shift the focus to vocabulary or morphology training.

Applying co-training to reference resolution (2002)

Müller, Mark-Christoph ; Rapp, Stefan ; Strube, Michael

In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.

An API for discourse-level access to XML-encoded corpora (2002)

Müller, Mark-Christoph ; Strube, Michael

We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.

Demonstrating the MUSTE language learning environment (2018)

Lange, Herbert ; Ljunglöf, Peter

We present a language learning application that relies on grammars to model the learning outcome. Based on this concept we can provide a powerful framework for language learning exercises with an intuitive user interface and a high reliability. Currently the application aims to augment existing language classes and support students by improving the learner attitude and the general learning outcome. Extensions beyond that scope are promising and likely to be added in the future.

Information extraction with the Darmstadt Knowledge Processing Software Repository (Extended Abstract) (2008)

Gurevych, Iryna ; Müller, Mark-Christoph

Current Natural Language Processing (NLP) systems feature high-complexity processing pipelines that require the use of components at different levels of linguistic and application specific processing. These components often have to interface with external e.g. machine learning and information retrieval libraries as well as tools for human annotation and visualization. At the UKP Lab, we are working on the Darmstadt Knowledge Processing Software Repository (DKPro) (Gurevych et al., 2007a; Müller et al., 2008) to create a highly flexible, scalable and easy-to-use toolkit that allows rapid creation of complex NLP pipelines for semantic information processing on demand. The DKPro repository consists of several main parts created to serve the purposes of different NLP application areas

Annotating anaphoric and bridging relations with MMAX (2001)

Müller, Mark-Christoph ; Strube, Michael

We present a tool for the annotation of anaphoric and bridging relations in a corpus of written texts. Based on differences as well as similarities between these phenomena, we define an annotation scheme. We then implement the scheme within an annotation tool and demonstrate its use.

Learning domain-specific grammars from a small number of examples (2020)

Lange, Herbert ; Ljunglöf, Peter

In this paper we investigate the problem of grammar inference from a different perspective. The common approach is to try to infer a grammar directly from example sentences, which either requires a large training set or suffers from bad accuracy. We instead view it as a problem of grammar restriction or sub-grammar extraction. We start from a large-scale resource grammar and a small number of examples, and find a sub-grammar that still covers all the examples. To do this we formulate the problem as a constraint satisfaction problem, and use an existing constraint solver to find the optimal grammar. We have made experiments with English, Finnish, German, Swedish and Spanish, which show that 10–20 examples are often sufficient to learn an interesting domain grammar. Possible applications include computer-assisted language learning, domain-specific dialogue systems, computer games, Q/A-systems, and others.

Putting control into language learning (2018)

Lange, Herbert ; Ljunglöf, Peter

Controlled Natural Languages (CNLs) have many applications including document authoring, automatic reasoning on texts and reliable machine translation, but their application is not limited to these areas. We explore a new application area of CNLs, the use of CNLs in computer-assisted language learning. In this paper we present a a web application for language learning using CNLs as well as a detailed description of the properties of the family of CNLs it uses.

Multi-level annotation in MMAX (2003)

Müller, Mark-Christoph ; Strube, Michael

We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility

A machine learning approach to pronoun resolution in spoken dialogue (2003)

Strube, Michael ; Müller, Mark-Christoph

We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.

A flexible stand-off data model with query language for multi-level annotation (2005)

Müller, Mark-Christoph

We present an implemented XML data model and a new, simplified query language for multi-level annotated corpora. The new query language involves automatic conversion of queries into the underlying, more complicated MMAXQL query language. It supports queries for sequential and hierarchical, but also associative (e.g. coreferential) relations. The simplified query language has been designed with non-expert users in mind.

Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations (2009)

Jakob, Niklas ; Weber, Stefan Hagen ; Müller, Mark-Christoph ; Gurevych, Iryna

In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.

Semantic author name disambiguation with word embeddings (2017)

Müller, Mark-Christoph

We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.

Off-the-shelf semantic author name disambiguation for bibliographic data bases (2019)

Müller, Mark-Christoph ; Bannister, Adam ; Reitz, Florian

The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.

On the contribution of word-level semantics to practical author name disambiguation (2018)

Müller, Mark-Christoph

We demonstrate the utility of word embedding-based semantic similarity methods for Author Name Disambiguation.

Automatic detection of nonreferential it in spoken multi-party dialog (2006)

Müller, Mark-Christoph

We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes the system usable as a preprocessing filter for a coreference resolution system. We also report results of an annotation study dealing with the classification of it by naive subjects.

Knowledge sources for bridging resolution in multi-party dialog (2008)

Müller, Mark-Christoph ; Mieskes, Margot ; Strube, Michael

In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.

Resolving it, this, and that in unrestricted multi-party dialog (2007)

Müller, Mark-Christoph

We present an implemented system for the resolution of it, this, and that in transcribed multi-party dialog. The system handles NP-anaphoric as well as discourse-deictic anaphors, i.e. pronouns with VP antecedents. Selectional preferences for NP or VP antecedents are determined on the basis of corpus counts. Our results show that the system performs significantly better than a recency-based baseline.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

512 search hits