Refine
Year of publication
Document Type
- Conference Proceeding (12)
- Article (4)
- Contribution to a Periodical (4)
- Book (2)
- Other (1)
- Preprint (1)
Keywords
- Computerlinguistik (24) (remove)
Publicationstate
Reviewstate
- Peer-Review (3)
Publisher
- CLARIN (2)
- Universität Hamburg - Sonderforschungsbereich 538 (2)
- Verlag für Gesprächsforschung (2)
- ELRA (1)
- Europ. Akad. (1)
- European Language Resources Association (1)
- European Language Resources Association (ELRA) (1)
- Gardez!-Verl. (1)
- Gesellschaft für Sprachtechnologie and Computerlinguistik e.V. (1)
- Lambert-Lucas (1)
This paper describes the TEI-based ISO standard 2462:2016 “Transcription of spoken language” and other formats used within CLARIN for spoken language resources. It assesses the current state of support for the standard and the interoperability between these formats and with relevant tools and services. The main idea behind the paper is that a digital infrastructure providing language resources and services to researchers should also allow the combined use of resources and/or services from different contexts. This requires syntactic and semantic interoperability. We propose a solution based on the ISO/TEI format and describe the necessary steps for this format to work as an exchange format with basic semantic interoperability for spoken language resources across the CLARIN infrastructure and beyond.
As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.
We present web services implementing a workflow for transcripts of spoken language following TEI guidelines, in particular ISO 24624:2016 "Language resource management - Transcription of spoken language". The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
This paper discusses computational linguistic methods for the semi-automatic analysis of modality interdependencies (the combination of complex resources such as speaking, writing, and visualizing; MID) in professional crosssituational interaction settings. The overall purpose of the approach is to develop models, methods, and a framework for the description and analysis of MID forms and functions. The paper describes work in progress—the development of an annotation framework that allows annotating different data and file formats at various levels, to relate annotation levels and entries independently of the given file format, and to visualize patterns.
Stellungnahme zu Wolfgang Schneiders Artikel "Annotate in Transkriptionen aus DV-technischer Sicht"
(2002)
Der Einsatz des Computers zur Transkription natürlicher Gespräche ist in der Praxis zwar weit verbreitet, die schnelle Weiterentwicklung der Computertechnologie hat aber dazu geführt, dass verschiedene Systeme oft scheinbar zusammenhangslos nebeneinander stehen, ohne dass ihre Gemeinsamkeiten und Unterschiede Gegenstand einer umfassenden theoretischen Betrachtung wären. Der vorliegende Aufsatz will einen Beitrag zu einer solchen theoretischen Betrachtung der Gesprächstranskription auf dem Computer liefern, indem er einerseits einige grundlegende Überlegungen zu diesem Thema anstellt, andererseits einige allgemeine Aspekte der Konzeption und Umsetzung des Systems EXMARaLDA, das am SFB "Mehrsprachigkeit" an der Universität Harnburg entwickelt wird, beschreibt.
This paper describes EXMARaLDA, a system for computer transcription of spoken discourse developed and used by the SFB "Mehrsprachigkeit" at the university of Hamburg. EXMARaLDA consists of several DTDs for XML coding of transcription data and some input and output tools for these formats. Apart from being a transcription system in its own right, EXMARaLDA also plays the role of a mediator between older existing data formats at the SFB and between these formats and a planned database of multilingual spoken discourse.
EXMARaLDA is a system for computer transcription of spoken discourse that is being developed at the SFB ‚Mehrsprachigkeit’ as a basis of a multilingual discourse database into which the transcriptions in use at the SFB will be integrated at a later point in time. The present paper describes the theoretical background of the development – a formal model of discourse transcription based on the annotation graph formalism (Bird/Liberman (2001)) – and its practical realisation in the form of an XML-based data format and several tools for input, output and manipulation of the data.