Volltext-Downloads (blau) und Frontdoor-Views (grau)

Little strokes fell great oaks. Creating CoRoLa, the reference corpus of contemporary Romanian

  • The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, which reaches its peak with the reference corpus of contemporary Romanian language (CoRoLa). The paper describes decisions behind the kinds of texts collected, as well as processing and annotation steps, highlighting the structure and importance of metadata to the corpus. The reader is also introduced to the three ways in which (s)he can plunge into the rich linguistic data of the corpus, waiting to be discovered. Besides querying the corpus, word embeddings extracted from it are useful to various natural language processing applications and for linguists, when user-friendly interfaces offer them the possibility to exploit the data.

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Dan TufișGND, Verginica Barbu MititeluGND, Elena Irimia, Vasile Păiș, Radu Ion, Nils DiewaldGND, Maria Mitrofan, Mihaela OnofreiORCiDGND
URN:urn:nbn:de:bsz:mh39-93851
URL:http://www.lingv.ro/index.php?option=com_content&view=article&id=342%3Arrl-arhiva-2019&catid=36%3Areviste-ilb&Itemid=95
ISSN:0035-3957
Parent Title (Multiple languages):Revue Roumaine de Linguistique. On design, creation and use of of the Reference Corpus of Contemporary Romanian and its analysis tools. CoRoLa, KorAP, DRuKoLA and EuReCo
Publisher:Editura Academiei Române
Place of publication:Bucureşti
Document Type:Article
Language:English
Year of first Publication:2019
Date of Publication (online):2019/11/11
Publicationstate:Zweitveröffentlichung
Reviewstate:Peer-Review
Tag:Romanian corpus; acquisition; annotation; metadata; query
GND Keyword:Annotation; Korpus <Linguistik>; Metadaten; Rumänisch
Volume:64
Issue:3
First Page:227
Last Page:240
Dewey Decimal Classification:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Licence (German):Es gilt das UrhG