OPUS 4 | Search

Federated content search for Lexical Resources (LexFCS): Specification (2023)

Körner, Erik ; Eckart, Thomas ; Herold, Axel ; Wiegand, Frank ; Michaelis, Frank ; Bremm, Matthias ; Cotgrove, Louis ; Trippel, Thorsten ; Rau, Felix

The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.

Conversation-analytic transcription of Arabic-German talk-in-interaction (2019)

Farag, Rahaf

The paper deals with the process of computer-aided transcription regarding Arabic-German data material for interaction-based studies. First of all, it sheds light upon some major methodological challenges posed by the conversation-analytic approaches: due to current corpus technology, the reciprocity, linearity, and simultaneity of linguistic activities cannot be reconstructed in an analytically proper way when using the Arabic characters in multilingual and bidirectional transcripts. The difficulty of transcribing Arabic encounters is also compounded by the fact that Spoken Arabic as well as its varieties and phenomena have not been standardised enough (for conversation-analytic purposes). Therefore, the second part of this paper is dedicated to preliminary, self-developed solutions, namely a systematic method for transcribing Spoken Arabic.

User's Guide for the ZAS Database of Clause-Embedding Predicates (2017)

Stiebels, Barbara ; McFadden, Thomas ; Schwabe, Kerstin ; Solstad, Torgrim ; Kellner, Elisa ; Sommer, Livia ; Stoltmann, Katarzyna

Annotationsrichtlinien des Projekts "Redewiedergabe. Eine literatur- und sprachwissenschaftliche Korpusanalyse" (2020)

Brunner, Annelen ; Weimer, Lukas ; Engelberg, Stefan ; Jannidis, Fotis ; Tu, Ngoc Duyen Tanja

Language Resources and Research under the General Data Protection Regulation (2018)

Kamocki, Paweł ; Ketzan, Erik ; Wildgans, Julia

The General Data Protection Regulation (hereinafter: GDPR), EU Regulation 2016/679 of 27 April 2016, will become applicable on 25 May 2018 and repeal the Personal Data Directive of 24 October 1995. Unlike a directive, which requires transposition into national laws (while leaving the choice of “forms and methods” to the Member States), a regulation is binding and directly applicable in all Member States. This means that when the GDPR becomes applicable, all the EU countries will have the same rules regarding the protection of personal data — at least in principle, since some details (including in the area of research — see below) are expressly left to the discretion of the Member States. The GDPR is a particularly ambitious piece of legislation (consisting of 99 articles and 173 recitals) whose intended territorial scope extends beyond the borders of the European Union. Its main concepts and principles are essentially similar to those of the Personal Data Directive, but enriched with interpretation developed through the case law of the CJEU and the opinions of the Article 29 Data Protection Working Party (hereinafter: WP29). This White Paper will discuss the main principles of data protection and their impact on language resources, as well as special rules regarding research under the GDPR and the standardisation mechanisms recognized by the Regulation.

Guidelines for Building Language Corpora Under German Law. Guidelines by the DFG Review Board on Linguistics (2017)

Ketzan, Erik ; Wildgans, Julia ; Weitzmann, John

The possibilities of re-use and archiving of spoken and written corpora are affected by personality rights (depending on legal tradition also called: the right of publicity), copyright law and data protection / privacy laws. These recommendations include information about legal aspects which should be considered while creating corpora to ensure the greatest archivability and re-usability possible in compliance with current laws. The information compiled here shall serve researchers who plan to create corpora or who are involved in evaluation of such measures as a guideline. This information is not exhaustive or to be considered as legal advice. Researchers should consult institutional legal departments and management before making legally relevant decisions. That said, further legal expertise should be sought if possible as early as project planning phases.

Guideline: Syntactic annotation and segmentation in the SegCor Project (2018)

Westpfahl, Swantje ; Proske, Nadine ; Hobich, Melanie ; Borlinghaus, Anton ; Strub, Hanna

Guideline: syntaktische Segmentierung in FOLKER (2019)

Westpfahl, Swantje ; Schmidt, Thomas ; Borlinghaus, Anton ; Strub, Hanna

Metadaten im Programmbereich „Mündliche Korpora“ des IDS (2017)

Dickgießer, Sylvia

STTS 2.0. Guidelines für die Annotation von POS -Tags für Transkripte gesprochener Sprache in Anlehnung an das Stuttgart Tübingen Tagset (STTS) (2017)

Westpfahl, Swantje ; Schmidt, Thomas ; Jonietz, Jasmin ; Borlinghaus, Anton

Die Guidelines sind eine Erweiterung des STTS (Schiller et al. 1999) für die Annotation von Transkripten gesprochener Sprache. Dieses Tagset basiert auf der Annotation des FOLK-Korpus des IDS Mannheim (Schmidt 2014) und es wurde gegenüber dem STTS erweitert in Hinblick auf typisch gesprochensprachliche Phänomene bzw. Eigenheiten der Transkription derselben. Es entstand im Rahmen des Dissertationsprojekts „POS für(s) FOLK – Entwicklung eines automatisierten Part-of-Speech-Tagging von spontansprachlichen Daten“ (Westpfahl 2017 (i.V.)).

Einführung in die Benutzung der Ressourcen DGD und FOLK für gesprächsanalytische Zwecke. Handreichung: "sprich" als Reformulierungsindikator (2016)

Kaiser, Julia ; Schmidt, Thomas

Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand des Beispiels "sprich" als Diskursmarker bzw. Reformulierungsindikator Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.

Einführung in die Benutzung der Ressourcen DGD und FOLK für gesprächsanalytische Zwecke. Handreichung: Einfache Recherche-Anfragen als Übungsbeispiele (2016)

Kaiser, Julia ; Schmidt, Thomas

Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand vier verschiedener Beispiele Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.

Einführung in die Benutzung der Ressourcen DGD und FOLK für gesprächsanalytische Zwecke. Handreichung: Metapragmatische Modalisierungen. (2016)

Kaiser, Julia ; Schmidt, Thomas

Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand des Beispiels metapragmatischer Modalisierungen mit den Adverbien "sozusagen" und "gewissermaßen" und mit der Formel "in Anführungszeichen/-strichen" Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.

A brief tutorial on using collocations for uncovering and contrasting meaning potentials of lexical items (2009)

Perkuhn, Rainer ; Keibel, Holger

This introductory tutorial describes a strictly corpus-driven approach for uncovering indications for aspects of use of lexical items. These aspects include ‘(lexical) meaning’ in a very broad sense and involve different dimensions, they are established in and emerge from respective discourses. Using data-driven mathematical-statistical methods with minimal (linguistic) premises, a word’s usage spectrum is summarized as a collocation profile. Self-organizing methods are applied to visualize the complex similarity structure spanned by these profiles. These visualizations point to the typical aspects of a word’s use, and to the common and distinctive aspects of any two words.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

14 search hits