Korpuslinguistik
Refine
Year of publication
- 2019 (28) (remove)
Document Type
- Article (11)
- Conference Proceeding (9)
- Part of a Book (6)
- Book (2)
Has Fulltext
- yes (28)
Keywords
- Korpus <Linguistik> (25)
- corpus linguistics (9)
- Deutsch (7)
- corpus processing (6)
- Rumänisch (5)
- Sprachstatistik (4)
- corpus management (4)
- web corpora (4)
- Annotation (3)
- CoRoLa (3)
Publicationstate
- Veröffentlichungsversion (14)
- Zweitveröffentlichung (12)
- Erstveröffentlichung (1)
- Postprint (1)
Reviewstate
- Peer-Review (21)
- (Verlags)-Lektorat (7)
Publisher
The DRuKoLA project
(2019)
DRuKoLA, the accompanying project in the making of the Corpus of Romanian Language, is a cooperation between German and Romanian computer scientists, corpus linguists and linguists, aiming at linking reference corpora of European languages under one corpus analysis tool able to manage big data. KorAP, the analysis tool developed at the Leibniz Institute for the German Language (Mannheim), is being tailored for the Romanian language in a first attempt to reunite reference corpora under the EuReCo initiative, detailed in this paper. The paper describes the necessary steps of harmonization within KorAP and the corpus of Romanian language and discusses, as one important goal of this project, criteria and ways to build virtual comparable corpora to be used for contrastive linguistic analyses.
The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Language (CoRoLa) can be used. A multitude of examples intends to highlight a wide range of interrogation possibilities that CoRoLa opens for different types of users. The querying of CoRoLa displayed here is supported by the KorAP frontend, through the querying language Poliqarp. Interrogations address annotation layers, such as the lexical, morphological and, in the near future, the syntactical layer, as well as the metadata. Other issues discussed are how to build a virtual corpus, how to deal with errors, how to find expressions and how to identify expressions.
The user interfaces for corpus analysis platforms must provide a high degree of accessibility for ordinary users and at the same time provide the possibility to answer complex research questions. In this paper, we present the design concepts behind the user interface of KorAP, a corpus analysis platform that has evolved into the main gateway to CoRoLa, the Reference Corpus of Contemporary Romanian Language. Based on established principles of user interface design, we show how KorAP addresses the challenge of providing a user-friendly interface for heterogeneous corpus data to a wide range of users with different research questions.
Little strokes fell great oaks. Creating CoRoLa, the reference corpus of contemporary Romanian
(2019)
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, which reaches its peak with the reference corpus of contemporary Romanian language (CoRoLa). The paper describes decisions behind the kinds of texts collected, as well as processing and annotation steps, highlighting the structure and importance of metadata to the corpus. The reader is also introduced to the three ways in which (s)he can plunge into the rich linguistic data of the corpus, waiting to be discovered. Besides querying the corpus, word embeddings extracted from it are useful to various natural language processing applications and for linguists, when user-friendly interfaces offer them the possibility to exploit the data.
Introduction
(2019)
Persuasionsstrategien in deutschen rechtsorientierten Zeitungen. Eine korpuslinguistische Studie
(2019)
Corpus Linguistics has often proved fruitful to examine different types of discourses, also the one of refugees. Aim of the paper is to show how language usage patterns can be focused on with the help of techniques grounded in Corpus Linguistics, giving information about themes and topoi. After showing what type of words (keywords, collocations) and what type of phenomena will be considered (topoi, metaphors and frames) in the article, the focus will shift on the methodology and the adopted criteria. After presenting the primary corpus (articles from right-oriented newspapers) and the comparison corpus (articles from 'Die Zeit') the main results of the analysis are presented and reflected on.
Das Archiv für Gesprochenes Deutsch (AGD, Stift/Schmidt 2014) am Leibniz-Institut für Deutsche Sprache ist ein Forschungsdatenzentrum für Korpora des gesprochenen Deutsch. Gegründet als Deutsches Spracharchiv (DSAv) im Jahre 1932 hat es über Eigenprojekte, Kooperationen und Übernahmen von Daten aus abgeschlossenen Forschungsprojekten einen Bestand von bald 100 Variations-, Interview- und Gesprächskorpora aufgebaut, die u. a. dialektalen Sprachgebrauch, mündliche Kommunikationsformen oder die Sprachverwendung bestimmter Sprechertypen oder zu bestimmten Themen dokumentieren. Heute ist dieser Bestand fast vollständig digitalisiert und wird zu einem großen Teil der wissenschaftlichen Gemeinschaft über die Datenbank für Gesprochenes Deutsch (DGD) im Internet zur Nutzung in Forschung und Lehre angeboten.
Vorwort
(2019)