Leakage explains the apparent superiority of Bayesian random effect models – a preregistered comment on Claessens, Kyritsis and Atkinson (2023)
- In a previous study, Claessens, Kyritsis, and Atkinson (CKA) demonstrated the importance of controlling for geographic proximity and cultural similarity in cross-national analyses. Based on a simulation study, CKA showed that methods commonly used to control for spatial and cultural non-independence are insufficient in reducing false positives while maintaining the ability to detect true effects. CKA strongly advocate the use of Bayesian random effect models in such situations, arguing that among the studied model types, they are the only ones that reduced false positives while maintaining high statistical power. However, in this comment, we argue that the apparent superiority of such models is overstated by CKA due to a form of methodological circularity called 'leakage' in statistics and machine learning, because the same proximity matrix is used both to generate the simulated data and as an input to only the Bayesian models for comparison. When this leakage is controlled for, we show that Bayesian models do not outperform most other methods.
Author: | Sascha WolferORCiDGND, Alexander KoplenigGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-127850 |
DOI: | https://doi.org/10.31219/osf.io/ex267 |
Publisher: | OSF Preprints, Center for Open Science |
Place of publication: | Charlottesville, VA |
Document Type: | Preprint |
Language: | English |
Year of first Publication: | 2024 |
Date of Publication (online): | 2024/08/22 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
GND Keyword: | Bayes-Modell; Datenanalyse; Kommentar; Methode; Statistik |
Page Number: | 9 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Program areas: | Lexik |
Licence (English): | Creative Commons - Attribution 4.0 International |