TY - CHAP U1 - Teil eines Buches A1 - Krek, Simon A1 - Gantar, Polona A1 - Kosem, Iztok ED - Klosa-Kückelhaus, Annette ED - Engelberg, Stefan ED - Möhrs, Christine ED - Storjohann, Petra T1 - Extraction of collocations from the Gigafida 2.1 corpus of Slovene T2 - Dictionaries and Society. Proceedings of the XX EURALEX International Congress, 12-16 July 2022, Mannheim, Germany N2 - This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene. KW - Korpus KW - Kollokation KW - Computerlingustik KW - Syntax KW - Collocations KW - discovering collocations in corpora KW - digital collocation database KW - Gigafida 2.1 corpus KW - Slowenisch Y1 - 2022 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-111828 UR - https://euralex2022.ids-mannheim.de/wp-content/uploads/2022/07/Proceedings_11.07.2022.pdf SN - 978-3-937241-87-6 SB - 978-3-937241-87-6 U6 - https://doi.org/10.14618/ids-pub-11182 DO - https://doi.org/10.14618/ids-pub-11182 SP - 240 EP - 252 PB - IDS-Verlag CY - Mannheim ER -