National library as corpus: introducing DeLiKo@DNB – a large synchronous German fiction corpus
- This paper introduces DeLiKo@DNB, a large, linguistically annotated, and large, freely accessible contemporary corpus of German fiction. The corpus currently comprises 2 billion words from over 26,000 books published between 2005 and the present, spanning pulp and genre fiction as well as literary award-winning works. We provide a detailed account of the corpus composition, metadata, and key features. Additionally, we outline our approach to ensuring lawful and productive access by deploying an instance of the open-source corpus analysis platform KorAP within the German National Library.
| Author: | Marc KupietzORCiDGND, Peter LeinenORCiDGND, Nils DiewaldORCiDGND, Philippe GenêtORCiDGND, Rebecca WilmORCiDGND, Andreas WittORCiDGND, Rameela YaddehigeORCiD |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-130705 |
| DOI: | https://doi.org/10.5281/zenodo.14943116 |
| Parent Title (Multiple languages): | Book of Abstracts. DHd 2025: Under Construction. 11. Jahrestagung des Verbands Digital Humanities im deutschsprachigen Raum e.V.. Universität Bielefeld und HSBI, 3.–7. März 2025, Bielefeld, Deutschland |
| Publisher: | Zenodo |
| Place of publication: | Genf |
| Editor: | Nils ReiterORCiDGND, Thomas HaiderORCiDGND, Daniel KababgiORCiD, Hendrik BuschmeierORCiDGND |
| Document Type: | Part of a Book |
| Language: | English |
| Year of first Publication: | 2025 |
| Date of Publication (online): | 2025/03/21 |
| Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| Tag: | Annotieren; DeLiKo@DNB; Literatur; Sammlung; Text; Umwandlung; Virtuelle Forschungsumgebungen IPR; contemporary; corpus; corpus analysis; fiction; library as corpus; linguistic annotation; literature; metadata |
| GND Keyword: | Annotation; Deutsch; Deutsche Nationalbibliothek; Korpus <Linguistik>; Metadaten; Nationalbibliothek |
| First Page: | 482 |
| Last Page: | 485 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Linguistics-Classification: | Korpuslinguistik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (English): | Creative Commons - Attribution 4.0 International |


