A corpus-based diachronic analysis of Slovene clitics
- This paper presents a manually annotated corpus of historical Slovene and a study, based on this corpus, of how clitics have changed in the Slovene language over time. The corpus contains 1,000 sampled pages, comprising about 300,000 tokens from over 80 works, spanning the period from the end of the 16th century to the end of the 19th. Each word is manually annotated with its modern day equivalent, lemma and part-of- speech tag. The paper discusses the composition, encoding and availability of the corpus, and then presents a study of word-tokenization mismatches between contemporary and historical Slovene, concentrating on the binding of clitics with their host, and on the variability of clitic orthography in the corpus.
Author: | Tomaz Erjavec, Alenka Jelovsek |
---|---|
URN: | urn:nbn:de:bsz:mh39-127290 |
ISBN: | 978-3-8233-6760-4 |
Parent Title (English): | New Methods in Historical Corpora |
Series (Serial Number): | Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache | Corpus Linguistics and Interdisciplinary Perspectives on Language | CLIP (3) |
Publisher: | Narr |
Place of publication: | Tübingen |
Editor: | Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2013 |
Date of Publication (online): | 2024/07/01 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Zweitveröffentlichung |
Reviewstate: | (Verlags)-Lektorat |
GND Keyword: | Historische Sprachwissenschaft; Korpus <Linguistik>; Slovenisch |
First Page: | 117 |
Last Page: | 126 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Korpuslinguistik |
Licence (German): | ![]() |