OPUS 4 | Search

71 search hits

1 to 10

Sort by

WorldViews: Access to international textbooks for digital humanities researchers (2017)

Hennicke, Steffen ; Stahn, Lena-Luise ; De Luca, Ernesto William ; Schwedes, Kerstin ; Witt, Andreas

This paper introduces the field of international textbook research and discusses how the WorldViews project is working towards enhanced access to textbook resources for digital humanities research.

Wohin damit? Storing and reusing my language data: Minute Madness der Datenzentren (2023)

Präsentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Die Präsentation wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. gehalten.

Verknüpfung heterogener texttechnologischer Ressourcen (2005)

Goecke, Daniela ; Metzing, Dieter ; Witt, Andreas

Gegenstand des Workshop-Beitrags ist die Verknüpfung heterogener linguistischer Ressourcen. Eine bedeutende Teilmenge von Ressourcen in der gegenwärtigen linguistischen Forschung und Anwendung besteht zum einen aus XML-annotierten Textdokumenten und zum anderen aus externen Ressourcen wie Grammatiken, Lexika oder Ontologien. Es wird eine Architektur vorgestellt, die eine Integration heterogener Ressourcen erlaubt, wobei die Methoden zur Integration unabhängig von der jeweiligen Anwendung sind und somit verschiedene Verknüpfungen ermöglichen. Eine exemplarische Anwendung der Methodologie ist die Analyse anaphorischer Beziehungen.

Unification of XML Documents with Concurrent Markup (2004)

Witt, Andreas ; Lüngen, Harald ; Sasaki, Felix ; Goecke, Daniela

Towards validation of concurrent markup (2006)

Schonefeld, Oliver ; Witt, Andreas

XCONCUR allows for the annotations of multiple concurrent hierarchies, but lacks cross-layer validation. This paper explores the requirements for a constraint-based approach for such a validation process.

Towards declarative descriptions of transformations: An approach based on topic maps (2002)

Lenz, Eva ; Witt, Andreas ; Storrer, Angelika

Toward a CLARIN Data Protection Code of Conduct (2018)

Kamocki, Pawel ; Ketzan, Erik ; Wildgans, Julia ; Witt, Andreas

This abstract discusses the possibility to adopt a CLARIN Data Protection Code of Conduct pursuant art. 40 of the General Data Protection Regulation. Such a code of conduct would have important benefits for the entire language research community. The final section of this abstract proposes a roadmap to the CLARIN Data Protection Code of Conduct, listing various stages of its drafting and approval procedures.

The New IDS Corpus Analysis Platform: Challenges and Prospects (2012)

Bański, Piotr ; Fischer, Peter M. ; Frick, Elena ; Ketzan, Erik ; Kupietz, Marc ; Schnober, Carsten ; Schonefeld, Oliver ; Witt, Andreas

The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.

The Meta-data-Database of a Next Generation Sustainability Web-Platform for Language Resources (2008)

Rehm, Georg ; Schonefeld, Oliver ; Witt, Andreas ; Lehmberg, Timm ; Chiarcos, Christian ; Béchara, Hannan ; Eishold, Florian ; Evang, Kilian ; Leshtanska, Magdalena ; Savkov, Alexandar ; Stark, Matthias

Our goal is to provide a web-based platform for the long-term preservation and distribution of a heterogeneous collection of linguistic resources. We discuss the corpus preprocessing and normalisation phase that results in sets of multi-rooted trees. At the same time we transform the original metadata records, just like the corpora annotated using different annotation approaches and exhibiting different levels of granularity, into the all-encompassing and highly ﬂexible format eTEI for which we present editing and parsing tools. We also discuss the architecture of the sustainability platform. Its primary components are an XML database that contains corpus and metadata ﬁles and an SQL database that contains user accounts and access control lists. A staging area, whose structure, contents, and consistency can be checked using tools, is used to make sure that new resources about to be imported into the platform have the correct structure.

The German Reference Corpus: New developments building on almost 50 years of experience (2010)

Kupietz, Marc ; Schonefeld, Oliver ; Witt, Andreas

This paper describes the efforts in the field of sustainability of the Institut für Deutsche Sprache (IDS) in Mannheim with respect to DEREKO (Deutsches Referenzkorpus) the Archive of General Reference Corpora of Contemporary Written German. With focus on re-usability and sustainability, we discuss its history and our future plans. We describe legal challenges related to the creation of a large and sustainable resource; sketch out the pipeline used to convert raw texts to the final corpus format and outline migration plans to TEI P5. Due to the fact, that the current version of the corpus management and query system is pushed towards its limits, we discuss the requirements for a new version which will be able to handle current and future DEREKO releases. Furthermore, we outline the institute’s plans in the field of digital preservation.

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

71 search hits