Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 45 of 3888
Back to Result List

Data-driven identification of idioms in song lyrics

  • The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Miriam AminGND, Peter FankhauserGND, Marc KupietzORCiDGND, Roman SchneiderGND
URN:urn:nbn:de:bsz:mh39-106825
DOI:https://doi.org/10.18653/v1/2021.mwe-1.3
ISBN:978-1-954085-71-8
Parent Title (English):Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)
Publisher:Association for Computational Linguistics
Place of publication:Stroudsburg
Editor:Paul Cook, Jelena Mitrović, Carla Parra Escartín, Ashwini Vaidya, Petya Osenova, Shiva Taslimipoor, Carlos Ramisch
Document Type:Conference Proceeding
Language:English
Year of first Publication:2021
Date of Publication (online):2021/09/22
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:multiword expressions; natural language processing
GND Keyword:Automatische Sprachanalyse; Automatische Spracherkennung; Deutsch; Komposition <Wortbildung>; Lyrics <Lyrik>; Phraseologie; Semantik
First Page:13
Last Page:22
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Computerlinguistik
Program areas:G2: Sprachinformationssysteme
Program areas:S1: Korpuslinguistik
Program areas:S2: Forschungskoordination und –infrastrukturen
Licence (English):License LogoCreative Commons - Attribution 4.0 International