Volltext-Downloads (blau) und Frontdoor-Views (grau)
The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 8 of 41
Back to Result List

Testing the relationship between word length, frequency, and predictability based on the German Reference Corpus

  • In a recent article, Meylan and Griffiths (Meylan & Griffiths, 2021, henceforth, M&G) focus their attention on the significant methodological challenges that can arise when using large-scale linguistic corpora. To this end, M&G revisit a well-known result of Piantadosi, Tily, and Gibson (2011, henceforth, PT&G) who argue that average information content is a better predictor of word length than word frequency. We applaud M&G who conducted a very important study that should be read by any researcher interested in working with large-scale corpora. The fact that M&G mostly failed to find clear evidence in favor of PT&G's main finding motivated us to test PT&G's idea on a subset of the largest archive of German language texts designed for linguistic research, the German Reference Corpus consisting of ∼43 billion words. We only find very little support for the primary data point reported by PT&G.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Alexander KoplenigORCiDGND, Marc KupietzORCiDGND, Sascha WolferORCiDGND
URN:urn:nbn:de:bsz:mh39-110893
DOI:https://doi.org/10.1111/cogs.13090
ISSN:1551-6709
Parent Title (English):Cognitive Science
Publisher:Wiley
Place of publication:Hoboken
Document Type:Article
Language:English
Year of first Publication:2022
Date of Publication (online):2022/06/15
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]
Contributing Corporation:Cognitive Science Society
Publicationstate:Zweitveröffentlichung
Publicationstate:Postprint
Reviewstate:Peer-Review
Tag:Deutsches Referenzkorpus (DeReKo)
N-gram modeling; compression; corpus linguistics; information theory; large-scale corpora; uniform information density
GND Keyword:Deutsch; Informationsgehalt; Informationstheorie; Korpus <Linguistik>; Vorhersagbarkeit; Worthäufigkeit; Wortlänge
Volume:46
Issue:6
Page Number:10
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Program areas:L3: Lexik empirisch und digital
Program areas:S1: Korpuslinguistik
Licence (German):License LogoUrheberrechtlich geschützt