Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 55 of 10097
Back to Result List

Extraction of collocations from the Gigafida 2.1 corpus of Slovene

  • This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Simon Krek, Polona Gantar, Iztok Kosem
URN:urn:nbn:de:bsz:mh39-111828
URL:https://euralex2022.ids-mannheim.de/wp-content/uploads/2022/07/Proceedings_11.07.2022.pdf
DOI:https://doi.org/10.14618/ids-pub-11182
ISBN:978-3-937241-87-6
Parent Title (English):Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany
Publisher:IDS-Verlag
Place of publication:Mannheim
Editor:Annette Klosa-Kückelhaus, Stefan Engelberg, Christine Möhrs, Petra Storjohann
Document Type:Part of a Book
Language:English
Year of first Publication:2022
Date of Publication (online):2022/08/18
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
Tag:Collocations; Gigafida 2.1 corpus; digital collocation database; discovering collocations in corpora
GND Keyword:Computerlingustik; Kollokation; Korpus <Linguistik>; Slowenisch; Syntax
First Page:240
Last Page:252
DDC classes:400 Sprache / 420 Englisch
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Linguistics-Classification:Korpuslinguistik
Conferences, Workshops:Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany
Licence (German):License LogoCreative Commons - CC BY-SA - Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International