Refine
Year of publication
- 2019 (1)
Document Type
- Article (1)
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- yes (1)
Keywords
- Annotation (1)
- Korpus <Linguistik> (1)
- Metadaten (1)
- Romanian corpus (1)
- Rumänisch (1)
- acquisition (1)
- annotation (1)
- metadata (1)
- query (1)
Publicationstate
Reviewstate
- Peer-Review (1)
Publisher
- Editura Academiei Române (1) (remove)
Little strokes fell great oaks. Creating CoRoLa, the reference corpus of contemporary Romanian
(2019)
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, which reaches its peak with the reference corpus of contemporary Romanian language (CoRoLa). The paper describes decisions behind the kinds of texts collected, as well as processing and annotation steps, highlighting the structure and importance of metadata to the corpus. The reader is also introduced to the three ways in which (s)he can plunge into the rich linguistic data of the corpus, waiting to be discovered. Besides querying the corpus, word embeddings extracted from it are useful to various natural language processing applications and for linguists, when user-friendly interfaces offer them the possibility to exploit the data.