Volltext-Downloads (blau) und Frontdoor-Views (grau)

‘Representativeness’, ‘Bad Data’, and legitimate expectations. What can an electronic historical corpus tell us that we didn’t actually know already (and how)?

  • The availability of electronic corpora of historical stages of languages has been wel- comed as possibly attenuating the inherent problem of diachronic linguistics, i.e. that we only have access to what has chanced to come down to us - the problem which was memorably named by Labov (1992) as one of “Bad Data”. However, such corpora can only give us access to an increased amount ot historical material and this can essentially still only be a partial and possibly distorted picture of the actual language at a particular period of history. Corpora can be improved by taking a more representative sample of extant texts if these are available (as they are in significant number for periods after the invention of printing). But, as examples from the recently compiled GerManC corpus of seventeenth and eighteenth century German show, the evidence from such corpora can still fail to yield definitive answers to our questions about earlier stages of a language. The data still require expert interpretation, and it is important to be realistic about what can legitimately be expected from an electronic historical corpus.

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Martin DurrellGND
URN:urn:nbn:de:bsz:mh39-125483
ISBN:978-3-8233-6922-6
Parent Title (English):Historical corpora. Challenges and perspectives
Series (Serial Number):Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache | Corpus Linguistics and Interdisciplinary Perspectives on Language | CLIP (5)
Publisher:Narr
Place of publication:Tübingen
Document Type:Part of a Book
Language:English
Year of first Publication:2015
Date of Publication (online):2024/03/05
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Zweitveröffentlichung
Reviewstate:(Verlags)-Lektorat
GND Keyword:Computerlinguistik; Historische Sprachwissenschaft; Korpus <Linguistik>
First Page:13
Last Page:33
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
BDSL-Classification:Grammatik
Linguistics-Classification:Grammatikforschung
Linguistics-Classification:Korpuslinguistik
Licence (German):License LogoUrheberrechtlich geschützt