Corpus REDEWIEDERGABE
- This article presents the corpus REDEWIEDERGABE, a German-language historical corpus with detailed annotations for speech, thought and writing representation (ST&WR). With approximately 490,000 tokens, it is the largest resource of its kind. It can be used to answer literary and linguistic research questions and serve as training material for machine learning. This paper describes the composition of the corpus and the annotation structure, discusses some methodological decisions and gives basic statistics about the forms of ST&WR found in this corpus.
Author: | Annelen BrunnerGND, Stefan EngelbergORCiDGND, Fotis JannidisORCiDGND, Ngoc Duyen Tanja TuORCiDGND, Lukas WeimerGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-98963 |
URL: | http://www.lrec-conf.org/proceedings/lrec2020/index.html#803 |
ISBN: | 979-10-95546-34-4 |
Parent Title (English): | Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), May 11-16, 2020, Palais du Pharo, Marseille, France |
Publisher: | European Language Resources Association |
Place of publication: | Paris |
Editor: | Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2020 |
Date of Publication (online): | 2020/06/15 |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | Peer-Review |
Tag: | annotation; corpus; machine learning; speech thought writing representation |
GND Keyword: | Annotation; Korpus <Linguistik>; Maschinelles Lernen; Methodik; Redeerwähnung |
First Page: | 803 |
Last Page: | 812 |
Note: | Gefördert durch den Open-Access-Monografienfonds der Leibniz-Gemeinschaft |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik / 400 Sprache |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Computerlinguistik |
Linguistics-Classification: | Korpuslinguistik |
Program areas: | L2: Lexikalische Syntagmatik |
Licence (English): | Creative Commons - Attribution-NonCommercial 4.0 International |