Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 1 of 1
Back to Result List

The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII

  • The Google Ngram Corpora seem to offer a unique opportunity to study linguistic and cultural change in quantitative terms. To avoid breaking any copyright laws, the data sets are not accompanied by any metadata regarding the texts the corpora consist of. Some of the consequences of this strategy are analyzed in this article. I chose the example of measuring censorship in Nazi Germany, which received widespread attention and was published in a paper that accompanied the release of the Google Ngram data (Michel et al. (2010): Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176–82). I show that without proper metadata, it is unclear whether the results actually reflect any kind of censorship at all. Collectively, the findings imply that observed changes in this period of time can only be linked directly to World War II to a certain extent. Therefore, instead of speaking about general linguistic or cultural change, it seems to be preferable to explicitly restrict the results to linguistic or cultural change ‘as it is represented in the Google Ngram data’. On a more general level, the analysis demonstrates the importance of metadata, the availability of which is not just a nice add-on, but a powerful source of information for the digital humanities.

Download full text files

  • Koplenig_The_impact_of_lacking_metadate-2017.pdf

    (Print version, IDS-intern)

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author:Alexander KoplenigORCiDGND
Parent Title (English):Digital Scholarship in the Humanities
Publisher:Oxford University Press (OUP)
Place of publication:Oxford
Document Type:Article
Year of first Publication:2017
Date of Publication (online):2015/09/02
GND Keyword:Datenstruktur; Korpus <Linguistik>; Kulturwandel; Metadaten; Sprachstatistik; Sprachwandel
First Page:169
Last Page:188
Preprint is published under http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-31557

Advance Access published September, 12, 2015
DDC classes:400 Sprache
Open Access?:nein
Licence (German):Es gilt das UrhG