Volltext-Downloads (blau) und Frontdoor-Views (grau)

A corpus-based diachronic analysis of Slovene clitics

  • This paper presents a manually annotated corpus of historical Slovene and a study, based on this corpus, of how clitics have changed in the Slovene language over time. The corpus contains 1,000 sampled pages, comprising about 300,000 tokens from over 80 works, spanning the period from the end of the 16th century to the end of the 19th. Each word is manually annotated with its modern day equivalent, lemma and part-of- speech tag. The paper discusses the composition, encoding and availability of the corpus, and then presents a study of word-tokenization mismatches between contemporary and historical Slovene, concentrating on the binding of clitics with their host, and on the variability of clitic orthography in the corpus.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Tomaz Erjavec, Alenka Jelovsek
URN:urn:nbn:de:bsz:mh39-127290
ISBN:978-3-8233-6760-4
Parent Title (English):New Methods in Historical Corpora
Series (Serial Number):Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache | Corpus Linguistics and Interdisciplinary Perspectives on Language | CLIP (3)
Publisher:Narr
Place of publication:Tübingen
Editor:Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt
Document Type:Part of a Book
Language:English
Year of first Publication:2013
Date of Publication (online):2024/07/01
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS)
Publicationstate:Zweitveröffentlichung
Reviewstate:(Verlags)-Lektorat
GND Keyword:Historische Sprachwissenschaft; Korpus <Linguistik>; Slovenisch
First Page:117
Last Page:126
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Korpuslinguistik
Licence (German):License LogoUrheberrechtlich geschützt