Refine
Year of publication
Document Type
- Part of a Book (18)
- Conference Proceeding (9)
- Article (4)
- Working Paper (1)
Keywords
- Korpus <Linguistik> (29)
- Korpusanalyseplattform (KorAP) (5)
- Computerlinguistik (4)
- Deutsch (4)
- Forschungsdaten (4)
- Kontrastive Linguistik (4)
- Rumänisch (4)
- Sprachdaten (4)
- Benutzeroberfläche (3)
- Deutsches Referenzkorpus (DeReKo) (3)
Publicationstate
- Veröffentlichungsversion (17)
- Zweitveröffentlichung (10)
- Postprint (1)
Reviewstate
- Peer-Review (15)
- (Verlags)-Lektorat (10)
- Review-Status-unbekannt (1)
Publisher
- de Gruyter (6)
- Editura Academiei Române (3)
- European Language Resources Association (ELRA) (3)
- IDS-Verlag (3)
- Leibniz-Institut für Deutsche Sprache (2)
- CECL Papers 1 (1)
- De Gruyter (1)
- European Language Resources Association (1)
- European language resources association (ELRA) (1)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (1)
In the mid-1990s, the Faculty of Linguistics and Literary-Studies at Bielefeld University began to establish the field Text technology, both in research and education. Text technology is a new field of research on the border of Computational Linguistics and Computational Philology.
This paper focuses on Text technology in academic education. In 2002, Text Technology was introduced as a minor subject for B.A. Programs. It is organized in modules: Module 1 introduces the characteristics of electronic texts and documents, typography, typesetting systems and hypertext. Module 2 introduces one or two programming languages relevant to the field of humanities computing. Markup languages and the principles of information structuring are the main topics of Module 3. The formal fundamentals of computer-based text processing, as formal languages and their grammars, Logics et cetera are subjects of another module. The paper ends with a short description of other Bachelor- and Master-Programs at Bielefeld University which contain text technological themes.
The KorAP project (“Korpusanalyseplattform der nächste Generation”, “Corpus-analysis platform of the next generation”), carried out at the Institut fUr Deutsche Sprache (IDS) in Mannheim, Germany, has as its goal the development of a modem, state-of-the-art corpus-analysis platform, capable of handling very large corpora and opening the perspectives for innovative linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse extremely large amounts of primary data and annotations, while at the same time allowing an undistorted view of the primary un-annotated text, and thus fully satisfying expectations associated with a scientific tool. The project started in July 2011 and is funded till June 2014. The demo presentation in December will be the first version following a preliminary feature freeze, and will open the alpha testing phase of the project.
We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.
KoralQuery 0.3
(2015)
KoralQuery is a general corpus query protocol (i.e. independent of research tasks and corpus formats), serialized in JSON-LD [1]. KoralQuery focuses on simplicity of implementation rather than human readibility and writability. Support for a growing number of query languages is granted by the Koral serialization processor.
The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol. In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that Koral- Query is built on, we exemplify the representation of corpus queries in the serialized format and illustrate use cases in the KorAP project.
KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP’s design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DEREKO for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.
KorAP, die neue Korpusanalyseplattform des IDS, die COSMAS II im Laufe der kommenden 2–3 Jahre ablösen wird, bietet gerade zur Erforschung grammatischer Variation einige besondere Funktionalitäten. Grundlegend ist beispielsweise, dass KorAP die Repräsentation und Abfrage beliebiger und beliebig vieler Annotationsschichten, zum Beispiel zu Konstituenz- und Dependenzrelationen, unterstutzt und damit die Suche nach speziellen grammatischen Phänomenen erleichtert oder erst möglich macht. Darüber hinaus unterstutzt KorAP die Konstruktion virtueller Korpora anhand von Metadatenvariablen und erleichtert damit kontrastive Untersuchungen. Der vorliegende Artikel erläutert die für die grammatische Variationsforschung relevanten KorAP-Funktionalitäten im Einzelnen und gibt einen Einblick in ihre Grundlagen.