Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 2 of 45
Back to Result List

Languages with more speakers tend to be harder to (machine-)learn

  • Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Alexander KoplenigORCiDGND, Sascha WolferORCiDGND
URN:urn:nbn:de:bsz:mh39-122100
DOI:https://doi.org/10.1038/s41598-023-45373-z
Parent Title (English):Scientific Reports
Publisher:Springer Nature
Place of publication:Berlin
Document Type:Article
Language:English
Year of first Publication:2023
Date of Publication (online):2023/10/30
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Veröffentlichungsversion
Reviewstate:Peer-Review
GND Keyword:Kontrastive Linguistik; Korpus <Linguistik>; Künstliche Intelligenz; Maschinelles Lernen; Quantitative Methode
Volume:13
Page Number:18
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Quantitative Linguistik
Program areas:L3: Lexik empirisch und digital
Licence (English):License LogoCreative Commons - Attribution 4.0 International