Refine
Year of publication
Document Type
- Part of a Book (18)
- Conference Proceeding (9)
- Article (4)
- Working Paper (1)
Keywords
- Korpus <Linguistik> (29)
- Korpusanalyseplattform (KorAP) (5)
- Computerlinguistik (4)
- Deutsch (4)
- Forschungsdaten (4)
- Kontrastive Linguistik (4)
- Rumänisch (4)
- Sprachdaten (4)
- Benutzeroberfläche (3)
- Deutsches Referenzkorpus (DeReKo) (3)
- KorAP (3)
- comparable corpora (3)
- corpus linguistics (3)
- software (3)
- tokenization (3)
- Automatische Sprachanalyse (2)
- CoRoLa (2)
- Rechtschreibung (2)
- Software (2)
- Suchmaschine (2)
- corpora (2)
- corpus analysis (2)
- linguistic research software (2)
- metadata (2)
- Abfrage (1)
- Abfragesprache (1)
- Abfragesystem (1)
- Access Control (1)
- Algorithmus (1)
- Annotation (1)
- Automatische Sprachverarbeitung (1)
- Computational linguistics (1)
- Corpora (1)
- Corpora (Linguistics) (1)
- Corpus Analysis (1)
- Corpus Management (1)
- Corpus Tools (1)
- Corpus query platform (1)
- Data Vizualization (1)
- Daten (1)
- Datenaufbereitung (1)
- Datenbank (1)
- Datenformat (1)
- Datenqualität (1)
- Datenstruktur (1)
- Diskursanalyse (1)
- Diskurslinguistik (1)
- Empirische Linguistik (1)
- Endlicher Zustandsraum (1)
- EuReCo (1)
- European Reference Corpus (EuReCo) (1)
- Europäischer Referenzkorpus (EuReCo) (1)
- Forschungsfrage (1)
- Forschungsmethode (1)
- Fremdsprachenlernen (1)
- Fremdsprachenunterricht (1)
- Germanistik (1)
- Grammatik (1)
- Heuristik (1)
- Institut für Deutsche Sprache (1)
- Institut für Deutsche Sprache <Mannheim> (1)
- Interoperabilität (1)
- KorAP (Korpusanalyseplattform der nächsten Generation) (1)
- Korpusaufbereitung (1)
- Korpusbeleg (1)
- Korpusdesign (1)
- Korpusmanagement (1)
- Korpusrecherchesystem (1)
- Korpustechnologie (1)
- Kulturgeschichte (1)
- Linguistische Datenverarbeitung (1)
- Metadaten (1)
- Nachhaltigkeit (1)
- Nationalkorpora (1)
- Netzwerk (1)
- OAuth (1)
- Online-Dienst (1)
- Orthografisches Kernkorpus (OKK) (1)
- Poliqarp (1)
- Programmiersprache (1)
- Query Rewriting (1)
- R <Programm> (1)
- RKorAPClient (1)
- Reference Corpora (1)
- Referenzkorpora (1)
- Romanian corpus (1)
- Schriftsprache (1)
- Social Media (1)
- Softwareergonomie (1)
- Sprachgebrauch (1)
- Studiengang (1)
- Technische Infrastruktur (1)
- Technologie (1)
- Text Technology (1)
- Textkorpus (1)
- Textlinguistik (1)
- Texttechnologie (1)
- Transtextualität (1)
- Ungarisch (1)
- Visualisierung (1)
- Web Services (1)
- acquisition (1)
- annotation (1)
- collocation analysis (1)
- contrastive linguistics (1)
- corpus management (1)
- corpus processing (1)
- corpus query protocol (1)
- corpus querying (1)
- cross-linguistic data (1)
- extensibility (1)
- finite state (1)
- finite state tokenization (1)
- infrastructure technology (1)
- international comparable corpus (1)
- interoperability (1)
- large corpus data (1)
- legal issues (1)
- lexical level (1)
- light-verb constructions (1)
- microservices (1)
- morphological level (1)
- multilingual corpora (1)
- national corpora (1)
- query (1)
- reference corpora (1)
- research tools (1)
- reusability (1)
- reusability of research data (1)
- scalability (1)
- search engine (1)
- software quality management (1)
- space-delimited languages (1)
- standoff annotation (1)
- sustainability (1)
- syntactical level (1)
- user interface design (1)
- virtual collections (1)
- virtual corpus (1)
- Übersetzungswissenschaft (1)
Publicationstate
- Veröffentlichungsversion (17)
- Zweitveröffentlichung (10)
- Postprint (1)
Reviewstate
- Peer-Review (15)
- (Verlags)-Lektorat (10)
- Review-Status-unbekannt (1)
Publisher
- de Gruyter (6)
- Editura Academiei Române (3)
- European Language Resources Association (ELRA) (3)
- IDS-Verlag (3)
- Leibniz-Institut für Deutsche Sprache (2)
- CECL Papers 1 (1)
- De Gruyter (1)
- European Language Resources Association (1)
- European language resources association (ELRA) (1)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (1)
When comparing different tools in the field of natural language processing (NLP), the quality of their results usually has first priority. This is also true for tokenization. In the context of large and diverse corpora for linguistic research purposes, however, other criteria also play a role – not least sufficient speed to process the data in an acceptable amount of time. In this paper we evaluate several state of the art tokenization tools for German – including our own – with regard to theses criteria. We conclude that while not all tools are applicable in this setting, no compromises regarding quality need to be made.
KorAP, die neue Korpusanalyseplattform des IDS, die COSMAS II im Laufe der kommenden 2–3 Jahre ablösen wird, bietet gerade zur Erforschung grammatischer Variation einige besondere Funktionalitäten. Grundlegend ist beispielsweise, dass KorAP die Repräsentation und Abfrage beliebiger und beliebig vieler Annotationsschichten, zum Beispiel zu Konstituenz- und Dependenzrelationen, unterstutzt und damit die Suche nach speziellen grammatischen Phänomenen erleichtert oder erst möglich macht. Darüber hinaus unterstutzt KorAP die Konstruktion virtueller Korpora anhand von Metadatenvariablen und erleichtert damit kontrastive Untersuchungen. Der vorliegende Artikel erläutert die für die grammatische Variationsforschung relevanten KorAP-Funktionalitäten im Einzelnen und gibt einen Einblick in ihre Grundlagen.
Making corpora accessible and usable for linguistic research is a huge challenge in view of (too) big data, legal issues and a rapidly evolving methodology. This does not only affect the design of user-friendly graphical interfaces to corpus analysis tools, but also the availability of programming interfaces supporting access to the functionality of these tools from various analysis and development environments. RKorAPClient is a new research tool in the form of an R package that interacts with the Web API of the corpus analysis platform KorAP, which provides access to large annotated corpora, including the German reference corpus DeReKo with 45 billion tokens. In addition to optionally authenticated KorAP API access, RKorAPClient provides further processing and visualization features to simplify common corpus analysis tasks. This paper introduces the basic functionality of RKorAPClient and exemplifies various analysis tasks based on DeReKo, that are bundled within the R package and can serve as a basic framework for advanced analysis and visualization approaches.
Die Korpusanalyseplattform KorAP wird als Nachfolgesystem zu COSMAS II am Leibniz-Institut für Deutsche Sprache (IDS) entwickelt und erlaubt einen umfassenden Zugriff auf einen Teil von DeReKo (Kupietz et al. 2010). Trotz einiger noch fehlender Funktionalitäten ist KorAP bereits produktiv einsetzbar. Im Folgenden wollen wir am Beispiel der Untersuchung von Social-Media-Korpora einige neue Möglichkeiten und Besonderheiten vorstellen.
This paper reports on the latest developments of the European Reference Corpus EuReCo and the German Reference Corpus in relation to three of the most important CMLC topics: interoperability, collaboration on corpus infrastructure building, and legal issues. Concerning interoperability, we present new ways to access DeReKo via KorAP on the API and on the plugin level. In addition we report about advancements in the EuReCo- and ICC-initiatives with the provision of comparable corpora, and about recent problems with license acquisitions and our solution approaches using an indemnification clause and model licenses that include scientific exploitation.
Little strokes fell great oaks. Creating CoRoLa, the reference corpus of contemporary Romanian
(2019)
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, which reaches its peak with the reference corpus of contemporary Romanian language (CoRoLa). The paper describes decisions behind the kinds of texts collected, as well as processing and annotation steps, highlighting the structure and importance of metadata to the corpus. The reader is also introduced to the three ways in which (s)he can plunge into the rich linguistic data of the corpus, waiting to be discovered. Besides querying the corpus, word embeddings extracted from it are useful to various natural language processing applications and for linguists, when user-friendly interfaces offer them the possibility to exploit the data.
The present paper examines a variety of ways in which the Corpus of Contemporary Romanian Language (CoRoLa) can be used. A multitude of examples intends to highlight a wide range of interrogation possibilities that CoRoLa opens for different types of users. The querying of CoRoLa displayed here is supported by the KorAP frontend, through the querying language Poliqarp. Interrogations address annotation layers, such as the lexical, morphological and, in the near future, the syntactical layer, as well as the metadata. Other issues discussed are how to build a virtual corpus, how to deal with errors, how to find expressions and how to identify expressions.
The user interfaces for corpus analysis platforms must provide a high degree of accessibility for ordinary users and at the same time provide the possibility to answer complex research questions. In this paper, we present the design concepts behind the user interface of KorAP, a corpus analysis platform that has evolved into the main gateway to CoRoLa, the Reference Corpus of Contemporary Romanian Language. Based on established principles of user interface design, we show how KorAP addresses the challenge of providing a user-friendly interface for heterogeneous corpus data to a wide range of users with different research questions.