Building and querying Wikipedia discussion corpora using KorAP
- We introduce the new German Wikipedia talk page corpus with 1.14 billion tokens and multiple linguistic annotation layers, available via the corpus analysis platform KorAP.
| Author: | Eliza MargarethaORCiDGND, Harald LüngenORCiDGND, Nils DiewaldORCiDGND, Marc KupietzORCiDGND, Rameela YaddehigeORCiD |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-134958 |
| URL: | https://www.cmc2025.uni-bayreuth.de/en/proceedings/index.html |
| Parent Title (English): | Impulses and Approaches to Computer-Mediated Communication. Proceedings of the 12th International Conference on Computer Mediated Communication and Social Media Corpora for the Humanities. CMC 2025. 4th-5th September 2025. University of Bayreuth, Germany |
| Publisher: | Universität Bayreuth |
| Place of publication: | Bayreuth |
| Editor: | Annamária FábiánORCiDGND, Igor TrostORCiDGND |
| Document Type: | Part of a Book |
| Language: | English |
| Year of first Publication: | 2025 |
| Date of Publication (online): | 2025/10/07 |
| Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | (Verlags)-Lektorat |
| Tag: | KorAP Wikipedia; corpus construction; talk pages; wikitext |
| GND Keyword: | Computerunterstützte Kommunikation; Korpus <Linguistik> |
| First Page: | 123 |
| Last Page: | 124 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| Linguistics-Classification: | Korpuslinguistik |
| Program areas: | Digitale Sprachwissenschaft |
| Licence (German): | Creative Commons - CC BY - Namensnennung 4.0 International |


