Refine
Year of publication
- 2018 (17) (remove)
Document Type
- Part of a Book (8)
- Conference Proceeding (5)
- Book (3)
- Article (1)
Has Fulltext
- yes (17)
Is part of the Bibliography
- yes (17)
Keywords
- Digital Humanities (6)
- Korpus <Linguistik> (6)
- Linguistik (3)
- CLARIN (2)
- Computerlinguistik (2)
- Data Mining (2)
- Datenschutz (2)
- Deutsch (2)
- Germanistik (2)
- Infrastruktur (2)
Publicationstate
Reviewstate
- Peer-Review (12)
- (Verlags)-Lektorat (5)
Publisher
- European language resources association (ELRA) (7)
- de Gruyter (4)
- Universität zu Köln (3)
- Clarin (2)
- VDB (1)
Die moderne sprachwissenschaftliche Forschung nutzt in zunehmender Weise digitale Forschungsinfrastrukturen und Informationssysteme. Diese Entwicklung begann um die Jahrtausendwende und beschleunigt sich seither. Der Band thematisiert nationale und europäische Infrastrukturverbünde und verschiedene Sprachressourcen aus der germanistischen Sprachwissenschaft, die über digitale Infrastrukturen auffindbar, zugreifbar und (wieder-)verwendbar sind.
Contents:
1. Christoph Kuras, Thomas Eckart, Uwe Quasthoff and Dirk Goldhahn: Automation, management and improvement of text corpus production, S. 1
2. Thomas Krause, Ulf Leser, Anke Lüdeling and Stephan Druskat: Designing a re-usable and embeddable corpus search library, S. 6
3. Radoslav Rábara, Pavel Rychlý and Ondřej Herman: Distributed corpus search, S. 10
4. Adrien Barbaresi and Antonio Ruiz Tinoco: Using elasticsearch for linguistic analysis of tweets in time and space, S. 14
5. Marc Kupietz, Nils Diewald and Peter Fankhauser: How to Get the Computation Near the Data: Improving data accessibility to, and reusability of analysis functions in corpus query platforms, S. 20
6. Roman Schneider: Example-based querying for specialist corpora, S. 26
7. Paul Rayson: Increasing interoperability for embedding corpus annotation pipelines in Wmatrix and other corpus retrieval tools, S. 33
The present submission reports on a pilot project conducted at the Institute for the German Language (IDS), aiming at strengthening the connection between ISO TC37SC4 “Language Resource Management” and the CLARIN infrastructure. In terminology management, attempts have recently been made to use graph-theoretical analyses to get a better understanding of the structure of terminology resources. The project described here aims at applying some of these methods to potentially incomplete concept fields produced over years by numerous researchers serving as experts and editors of ISO standards. The main results of the project are twofold. On the one hand, they comprise concept networks dynamically generated from a relational database and browsable by the user. On the other, the project has yielded significant qualitative feedback that will be offered to ISO. We provide the institutional context of this endeavour, its theoretical background, and an overview of data preparation and tools used. Finally, we discuss the results and illustrate some of them.
The actual or anticipated impact of research projects can be documented in scientific publications and project reports. While project reports are available at varying level of accessibility, they might be rarely used or shared outside of academia. Moreover, a connection between outcomes of actual research project and potential secondary use might not be explicated in a project report. This paper outlines two methods for classifying and extracting the impact of publicly funded research projects. The first method is concerned with identifying impact categories and assigning these categories to research projects and their reports by extension by using subject matter experts; not considering the content of research reports. This process resulted in a classification schema that we describe in this paper. With the second method which is still work in progress, impact categories are extracted from the actual text data.
Seit Mitte der 1990er Jahre wird am Institut für deutsche Sprache (IDS) in Mannheim erforscht, wie der hochkomplexe Gegenstandsbereich „Grammatik“ unter Ausnutzung digitaler Sprachressourcen und hypertextueller Navigationsstrukturen gleichermaßen wissenschaftlich fundiert und anschaulich vermittelt werden kann. Die grammatischen Online-Informationssysteme des IDS wenden sich nicht allein an Forscher und die interessierte Öffentlichkeit in Deutschland, sondern in gleichem Maße an Germanisten und Deutsch-Lernende in der ganzen Welt. Der vorliegende Beitrag beschreibt die damit verbundenen Hoffnungen und Anspruche. Daran anschließend thematisiert er praktische Einsatzmöglichkeiten und skizziert die funktionale und inhaltliche Weiterentwicklung der digitalen Grammatik-Angebote.
How can we measure the impact – such as awareness for economic, ecological, and political matters – of information, such as scientific publications, user-generated content, and reports from the public administration, based on text data? This workshop brings together research from different theoretical paradigms and methodologies for the extraction of impact-relevant indicators from natural language text data and related meta-data. The papers in this workshop represent different types of expertise in different methods for analyzing text data; spanning the whole spectrum of qualitative, quantitative, and mixed methods techniques, as well as domain expertise in the field of impact measurement. The program was built to create an interdisciplinary half-day workshop where we discuss possibilities, limitations, and synergistic effects of different approaches.
New exceptions for Text and Data Mining and their possible impact on the CLARIN infrastructure
(2018)
The proposed paper discusses new exceptions for Text and Data Mining that have recently been adopted in some EU Member States, and probably will soon be adopted also at the EU level. These exceptions are of great significance for language scientists, as they exempt those who compile corpora from the obligation to obtain authorisation from rightholders. However, corpora compiled on the basis of such exceptions cannot be freely shared, which in a long run may have serious consequences for Open Science and the functioning of research infrastructure such as CLARIN ERIC.
This abstract discusses the possibility to adopt a CLARIN Data Protection Code of Conduct pursuant art. 40 of the General Data Protection Regulation. Such a code of conduct would have important benefits for the entire language research community. The final section of this abstract proposes a roadmap to the CLARIN Data Protection Code of Conduct, listing various stages of its drafting and approval procedures.
Deutsche Geschichte-Digital: Ergebnisse der TEI-Konvertierung und Integration in Pilotprojekten
(2018)
Das hier vorgeführte Schienenbild ist das in Anlehnung an Wittenburg (2009) als Erweiterungsinstrument gewählte Mittel in dem Versuch, Computertechnologie, linguistische Forschung und Vernetzung am Institut für Deutsche Sprache in deren rasch wachsenden Vielschichtigkeit zu beschreiben. Hier werden u. a. drei Blickwinkel, der des Technologie entwickelnden Wissenschaftlers, des entwickelnden Nutzers und des Nutzers von Informationstechnologie in der linguistischen Forschung vereint und um eine für den Sprachvergleich neue Dimension, die sprachspezifische Parameter von Analyseinstrumenten miteinander harmonisiert, erweitert.
Die Bedeutung von Forschungsdatenmanagement im wissenschaftspolitischen Diskurs und im wissenschaftlichen Arbeitsalltag nimmt stetig zu. Nationale und internationale Forschungsinfrastrukturen, Verbünde, disziplinäre Datenzentren und institutionelle Kompetenzzentren nähern sich den Herausforderungen aus unterschiedlichen Perspektiven. Dieser Beitrag stellt das Data Center for the Humanities an der Universität zu Köln als Beispiel für ein universitäres Datenzentrum mit fachlicher Spezialisierung auf die Geisteswissenschaften vor.
This paper discusses current trends in DeReKo, the German Reference Corpus, concerning legal issues around the recent German copyright reform with positive implications for corpus building and corpus linguistics in general, recent corpus extensions in the genres of popular magazines, journals, historical texts, and web-based football reports. Besides, DeReKo is finally accessible via the new
corpus research platform KorAP, offering registered users several news features in comparison with its predecessor COSMAS II.
The European digital research infrastructure CLARIN (Common Language Resources and Technology Infrastructure) is building a Knowledge Sharing Infrastructure (KSI) to ensure that existing knowledge and expertise is easily available both for the CLARIN community and for the humanities research communities for which CLARIN is being developed. Within the Knowledge Sharing Infrastructure, so called Knowledge Centres comprise one or more physical institutions with particular expertise in certain areas and are committed to providing their expertise in the form of reliable knowledge-sharing services. In this paper, we present the ninth K Centre – the CLARIN Knowledge Centre for Linguistic Diversity and Language Documentation (CKLD) – and the expertise and services provided by the member institutions at the Universities of London (ELAR/SWLI), Cologne (DCH/IfDH/IfL) and Hamburg (HZSK/INEL). The centre offers information on current best practices, available resources and tools, and gives advice on technological and methodological matters for researchers working within relevant fields.
Der vorliegende Band befasst sich mit dem Stand und der Entwicklung von Forschungsinfrastrukturen für die germanistische Linguistik und einigen angrenzenden Bereichen. Einen zentralen Aspekt dabei bildet die Notwendigkeit, Kooperativität in der Wissenschaft im institutionellen Sinne, aber auch in Hinsicht auf die wissenschaftliche Praxis zu organisieren. Dies geschieht in Verbunden als Kooperationsstrukturen, wobei Sprachwissenschaft und Sprachtechnologie miteinander verbunden werden. Als zentraler Forschungsressource kommen dabei Korpora und ihrer Erschließung durch spezielle, linguistisch motivierte Informationssysteme besondere Bedeutung zu. Auf der Ebene der Daten werden durch Annotations- und Modellierungsstandards die Voraussetzung für eine nachhaltige Nutzbarkeit derartiger Ressourcen geschaffen.