Quantifying the efficiency of written language
- Information theory can be used to assess how efficiently a message is transmitted on the basis of different symbolic systems. In this paper, I estimate the information-theoretic efficiency of written language for parallel text data in more than 1000 different languages, both on the level of characters and on the level of words as information encoding units. The main results show that (i) the median efficiency is ∼29% on the character level and ∼45% on the word level, (ii) efficiency on both levels is strongly correlated with each other and (iii) efficiency tends to be higher for languages with more speakers.
Author: | Alexander KoplenigORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-104401 |
DOI: | https://doi.org/10.1515/lingvan-2019-0057 |
ISSN: | 2199-174X |
Parent Title (English): | Linguistics Vanguard |
Publisher: | De Gruyter |
Place of publication: | Berlin, Boston |
Editor: | Alexander Bergs, Abigail C. Cohn, Jeff Good |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2021 |
Date of Publication (online): | 2021/05/25 |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | Peer-Review |
Tag: | community size; efficiency; information theory; permutation testing; typology |
GND Keyword: | Effizienz; Informationstheorie; Schriftsprache; Sprachstatistik; Sprachzeichen; Wort |
Volume: | 7 |
Issue: | s3 |
Page Number: | 11 |
Note: | Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich. This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively. |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Quantitative Linguistik |
Program areas: | L3: Lexik empirisch und digital |
Licence (German): | ![]() |