Volltext-Downloads (blau) und Frontdoor-Views (grau)

Creating the lexicon of multi-word expressions for Slovene methodology and structure

  • This paper describes a method for automatic identification of sentences in the Gigafida corpus containing multi-word expressions (MWEs) from the list of 5,242 phraseological units, which was developed on the basis of several existing open-access lexical resources for Slovene. The method is based on a definition of MWEs, which includes information on two levels of corpus annotation: syntax (dependency parsing) and morphology (POS tagging), together with some additional statistical parameters. The resulting lexicon contains 12,358 sentences containing MWEs extracted from the corpus. The extracted sentences were analysed from the lexicographic point of view with the aim of establishing canonical forms of MWEs and semantic relations between them in terms of variation, synonymy, and antonymy.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Polona Gantar, Simon Krek
URN:urn:nbn:de:bsz:mh39-112270
URL:https://euralex2022.ids-mannheim.de/wp-content/uploads/2022/07/Proceedings_11.07.2022.pdf
DOI:https://doi.org/10.14618/ids-pub-11227
ISBN:978-3-937241-87-6
Parent Title (English):Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany
Publisher:Ids-Verlag
Place of publication:Mannheim
Editor:Annette Klosa-Kückelhaus, Stefan Engelberg, Christine Möhrs, Petra Storjohann
Document Type:Part of a Book
Language:English
Year of first Publication:2022
Date of Publication (online):2022/09/08
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Lower Sorbian; Sorbian institute; e-lexicography; historical lexicography; language portal; lexical information system; minority language; text corpus
GND Keyword:Mehrworteinheit; Minderheitensprache; Sorbisch; historische Lexikographie
First Page:549
Last Page:562
DDC classes:400 Sprache / 420 Englisch
Open Access?:ja
Linguistics-Classification:Korpuslinguistik
Linguistics-Classification:Lexikografie
Conferences, Workshops:Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany
Licence (German):License LogoCreative Commons - CC BY-SA - Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International