OPUS 4 | Search

117 search hits

1 to 10

Sort by

Projektvorstellung – Sprachanfragen. Empirisch gestützte Erforschung von Zweifelsfällen (2023)

Lang, Christian ; Tu, Ngoc Duyen Tanja ; Schneider, Roman ; Volodina, Anna

"Das im Januar 2022 gestartete Projekt "Sprachanfragen" (https://www.ids-mannheim.de/gra/projekte2/sprachanfragen/) verfolgt erstmalig das Ziel, Sprachanfragedaten zu erfassen, aufzubereiten und ein wissenschaftsöffentliches Monitorkorpus aus ihnen zu erstellen. Dazukommend wird eine Rechercheschnittstelle entwickelt, mit der die Sprachanfragen systematisch wissenschaftlich analysierbar gemacht werden. Das Poster gibt einen Überblick über das Projekt, zeigt erste Ergebnisse und bietet einen Ausblick auf Überlegungen zur Konzeption eines Chatbots zur automatisierten Beantwortung von Sprachanfragen." Ein Beitrag zur 9. Tagung des Verbands "Digital Humanities im deutschsprachigen Raum" - DHd 2023 Open Humanities Open Culture.

Shallow context analysis for German idiom detection (2021)

Amin, Miriam ; Fankhauser, Peter ; Kupietz, Marc ; Schneider, Roman

In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.

RefCo and its checker: improving language documentation corpora’s reusability through a semi-automatic review process (2022)

Lange, Herbert ; Aznar, Jocelyn

The QUEST (QUality ESTablished) project aims at ensuring the reusability of audio-visual datasets (Wamprechtshammer et al., 2022) by devising quality criteria and curating processes. RefCo (Reference Corpora) is an initiative within QUEST in collaboration with DoReCo (Documentation Reference Corpus, Paschen et al. (2020)) focusing on language documentation projects. Previously, Aznar and Seifart (2020) introduced a set of quality criteria dedicated to documenting fieldwork corpora. Based on these criteria, we establish a semi-automatic review process for existing and work-in-progress corpora, in particular for language documentation. The goal is to improve the quality of a corpus by increasing its reusability. A central part of this process is a template for machine-readable corpus documentation and automatic data verification based on this documentation. In addition to the documentation and automatic verification, the process involves a human review and potentially results in a RefCo certification of the corpus. For each of these steps, we provide guidelines and manuals. We describe the evaluation process in detail, highlight the current limits for automatic evaluation and how the manual review is organized accordingly.

Metadata formats for learner corpora: case study and discussion (2022)

Lange, Herbert

Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.

Improving extractive dialogue summarization by utilizing human feedback (2007)

Mieskes, Margot ; Müller, Christoph ; Strube, Michael

Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the system. We deal with this problem by providing users with a GUI which allows them to correct automatically produced imperfect summaries. The corrected summary in turn is added to the pool of training data. The performance of the system is expected to improve as it adapts to the new domain.

Preface (2020)

Alfter, David ; Volodina, Elena ; Pilán, Ildikó ; Lange, Herbert ; Borin, Lars

Preface (2019)

Alfter, David ; Volodina, Elena ; Borin, Lars ; Pilán, Ildikó ; Lange, Herbert

Identifying implicitly abusive remarks about identity groups using a linguistically informed approach (2022)

Wiegand, Michael ; Eder, Elisabeth ; Ruppenhofer, Josef

We address the task of distinguishing implicitly abusive sentences on identity groups (“Muslims contaminate our planet”) from other group-related negative polar sentences (“Muslims despise terrorism”). Implicitly abusive language are utterances not conveyed by abusive words (e.g. “bimbo” or “scum”). So far, the detection of such utterances could not be properly addressed since existing datasets displaying a high degree of implicit abuse are fairly biased. Following the recently-proposed strategy to solve implicit abuse by separately addressing its different subtypes, we present a new focused and less biased dataset that consists of the subtype of atomic negative sentences about identity groups. For that task, we model components that each address one facet of such implicit abuse, i.e. depiction as perpetrators, aspectual classification and non-conformist views. The approach generalizes across different identity groups and languages.

Implementation of a Latin grammar in grammatical framework (2017)

Lange, Herbert

In this paper we present work in developing a computerized grammar for the Latin language. It demonstrates the principles and challenges in developing a grammar for a natural language in a modern grammar formalism. The grammar presented here provides a useful resource for natural language processing applications in different fields. It can be easily adopted for language learning and use in language technology for Cultural Heritage like translation applications or to support post-correction of document digitization.

Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku, Finland (2019)

Content 1 Predicting learner knowledge of individual words using machine learning Drilon Avdiu, Vanessa Bui, Klára Ptacinová Klimci´ková 2 Automatic Generation and Semantic Grading of Esperanto Sentences in a Teaching Context Eckhard Bick 3 Toward automatic improvement of language produced by non-native language learners Mathias Creutz, Eetu Sjöblom 4 Linguistic features and proficiency classification in L2 Spanish and L2 Portuguese Iria del Ri´o 5 Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education Sabrina Dittrich, Zarah Weiss, Hannes Schröter, Detmar Meurers 6 Formalism for a language agnostic language learning game and productive grid generation Sylvain Hatier, Arnaud Bey, Mathieu Loiseau 7 Understanding Vocabulary Growth Through An Adaptive Language Learning System Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger,Yu Qiao, Christian Kohlschein, Tobias Meisen 8 Summarization Evaluation meets Short-Answer Grading Margot Mieskes, Ulrike Padó 9 Experiments on Non-native Speech Assessment and its Consistency Ziwei Zhou, Sowmya Vajjala, Seyed Vahid Mirnezami 10 The Impact of Spelling Correction and Task Context on Short Answer Assessment for Intelligent Tutoring Systems Ramon Ziai, Florian Nuxoll, Kordula De Kuthy, Björn Rudzewitz, Detmar Meurers

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

117 search hits