Shallow context analysis for German idiom detection
- In order to differentiate between figurative and literal usage of verb-noun combinations for the shared task on the disambiguation of German Verbal Idioms issued for KONVENS 2021, we apply and extend an approach originally developed for detecting idioms in a dataset consisting of random ngram samples. The classification is done by implementing a rather shallow, statistics-based pipeline without intensive preprocessing and examinations on the morphosyntactic and semantic level. We describe the overall approach, the differences between the original dataset and the dataset of the KONVENS task, provide experimental classification results, and analyse the individual contributions of our feature sets.
Author: | Miriam AminORCiDGND, Peter FankhauserORCiDGND, Marc KupietzORCiDGND, Roman SchneiderORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-119560 |
DOI: | https://doi.org/10.5281/zenodo.5769519 |
Parent Title (English): | Proceedings of the shared task on the disambiguation of German verbal idioms at KONVENS 2021, Düsseldorf, Germany |
Publisher: | Zenodo |
Place of publication: | Genf |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2021 |
Date of Publication (online): | 2023/06/16 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | Natural Language Processing; idiom detection; multiword expressions; shared task |
GND Keyword: | Automatische Sprachanalyse; Computerlinguistik; Datensatz; Deutsch; Kontextanalyse; Phraseologie |
Page Number: | 9 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Program areas: | G2: Sprachinformationssysteme |
Program areas: | S1: Korpuslinguistik |
Program areas: | S2: Forschungskoordination und –infrastrukturen |
Licence (English): | Creative Commons - Attribution 4.0 International |