OPUS 4 | Search

Refine

Has Fulltext

yes (6)

6 search hits

1 to 6

Sort by

Learning domain-specific grammars from a small number of examples (2020)

In this paper we investigate the problem of grammar inference from a different perspective. The common approach is to try to infer a grammar directly from example sentences, which either requires a large training set or suffers from bad accuracy. We instead view it as a problem of grammar restriction or sub-grammar extraction. We start from a large-scale resource grammar and a small number of examples, and find a sub-grammar that still covers all the examples. To do this we formulate the problem as a constraint satisfaction problem, and use an existing constraint solver to find the optimal grammar. We have made experiments with English, Finnish, German, Swedish and Spanish, which show that 10–20 examples are often sufficient to learn an interesting domain grammar. Possible applications include computer-assisted language learning, domain-specific dialogue systems, computer games, Q/A-systems, and others.

Preface (2020)

Alfter, David ; Volodina, Elena ; Pilán, Ildikó ; Lange, Herbert ; Borin, Lars

Proceedings of the 9th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2020) (2020)

Content 1 Substituto - A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing Marianne Grace Araneta, Gülsen Eryigit, Alexander König, Ji-Ung Lee, Ana Luís, Verena Lyding, Lionel Nicolas, Christos Rodosthenous and Federico Sangati 2 The Teacher-Student Chatroom Corpus Andrew Caines, Helen Yannakoudakis, Helena Edmondson, Helen Allen, Pascual Pérez-Paredes, Bill Byrne and Paula Buttery 3 Polygloss - A conversational agent for language practice Etiene da Cruz Dalcol and Massimo Poesio 4 Show, Don’t Tell: Visualising Finnish Word Formation in a Browser-Based Reading Assistant Frankie Robertson

pyMMAX2: Deep access to MMAX2 projects from Python (2020)

Müller, Mark-Christoph

pyMMAX2 is an API for processing MMAX2 stand-off annotation data in Python. It provides a lightweight basis for the development of code which opens up the Java- and XML-based ecosystem of MMAX2 for more recent, Python-based NLP and data science methods. While pyMMAX2 is pure Python, and most functionality is implemented from scratch, the API re-uses the complex implementation of the essential business logic for MMAX2 annotation schemes by interfacing with the original MMAX2 Java libraries. pyMMAX2 is available for download at http://github.com/nlpAThits/pyMMAX2.

Reconstructing manual information extraction with DB-to-document backprojection: Experiments in the life science domain (2020)

Müller, Mark-Christoph ; Ghosh, Sucheta ; Rey, Maja ; Wittig, Ulrike ; Müller, Wolfgang ; Strube, Michael

We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing. We describe our data set of scanned documents and data records from the biological database SABIO-RK, provide a definition of the task, and report findings from preliminary experiments. Rigorous evaluation proved challenging due to lack of gold-standard data and a difficult notion of correctness. Qualitative inspection of results, however, showed the feasibility and usefulness of the task.

The FAIR Index of CMC Corpora (2020)

Frey, Jennifer-Carmen ; König, Alexander ; Stemle, Egon ; Falaise, Achille ; Fišer, Darja ; Lüngen, Harald

In this article, we examine the current situation of data dissemination and provision for CMC corpora. By that we aim to give a guiding grid for future projects that will improve the transparency and replicability of research results as well as the reusability of the created resources. Based on the FAIR guiding principles for research data management, we evaluate the 20 European CMC corpora listed in the CLARIN CMC Resource family, individuate successful strategies among the existing corpora and establish best practices for future projects. We give an overview of existing approaches to data referencing, dissemination and provision in European CMC corpora, and discuss the methods, formats and strategies used. Furthermore, we discuss the need for community standards and offer recommendations for best practices when creating a new CMC corpus.

1 to 6

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

6 search hits