Volltext-Downloads (blau) und Frontdoor-Views (grau)

Quantifying the efficiency of written language

  • Information theory can be used to assess how efficiently a message is transmitted on the basis of different symbolic systems. In this paper, I estimate the information-theoretic efficiency of written language for parallel text data in more than 1000 different languages, both on the level of characters and on the level of words as information encoding units. The main results show that (i) the median efficiency is ∼29% on the character level and ∼45% on the word level, (ii) efficiency on both levels is strongly correlated with each other and (iii) efficiency tends to be higher for languages with more speakers.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Alexander KoplenigORCiDGND
URN:urn:nbn:de:bsz:mh39-104401
DOI:https://doi.org/10.1515/lingvan-2019-0057
ISSN:2199-174X
Parent Title (English):Linguistics Vanguard
Publisher:De Gruyter
Place of publication:Berlin, Boston
Editor:Alexander Bergs, Abigail C. Cohn, Jeff Good
Document Type:Article
Language:English
Year of first Publication:2021
Date of Publication (online):2021/05/25
Publicationstate:Zweitveröffentlichung
Reviewstate:Peer-Review
Tag:community size; efficiency; information theory; permutation testing; typology
GND Keyword:Effizienz; Informationstheorie; Schriftsprache; Sprachstatistik; Sprachzeichen; Wort
Volume:7
Issue:s3
Page Number:11
Note:
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich.

This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Quantitative Linguistik
Program areas:L3: Lexik empirisch und digital
Licence (German):Es gilt das UrhG