Refine
Year of publication
- 2019 (2)
Document Type
- Article (2)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- yes (2)
Keywords
- Korpus <Linguistik> (2)
- Rumänisch (2)
- metadata (2)
- Abfrage (1)
- Annotation (1)
- Benutzeroberfläche (1)
- CoRoLa (1)
- KorAP (1)
- Metadaten (1)
- Poliqarp (1)
Publicationstate
Reviewstate
- Peer-Review (2)
Publisher
Little strokes fell great oaks. Creating CoRoLa, the reference corpus of contemporary Romanian
(2019)
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, which reaches its peak with the reference corpus of contemporary Romanian language (CoRoLa). The paper describes decisions behind the kinds of texts collected, as well as processing and annotation steps, highlighting the structure and importance of metadata to the corpus. The reader is also introduced to the three ways in which (s)he can plunge into the rich linguistic data of the corpus, waiting to be discovered. Besides querying the corpus, word embeddings extracted from it are useful to various natural language processing applications and for linguists, when user-friendly interfaces offer them the possibility to exploit the data.
The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Language (CoRoLa) can be used. A multitude of examples intends to highlight a wide range of interrogation possibilities that CoRoLa opens for different types of users. The querying of CoRoLa displayed here is supported by the KorAP frontend, through the querying language Poliqarp. Interrogations address annotation layers, such as the lexical, morphological and, in the near future, the syntactical layer, as well as the metadata. Other issues discussed are how to build a virtual corpus, how to deal with errors, how to find expressions and how to identify expressions.