410 Linguistik
Refine
Year of publication
Document Type
- Conference Proceeding (7)
- Book (3)
- Part of a Book (3)
- Contribution to a Periodical (3)
- Article (1)
- Working Paper (1)
Has Fulltext
- yes (18)
Keywords
- Transkription (10)
- Computerlinguistik (9)
- Gesprochene Sprache (9)
- Korpus <Linguistik> (6)
- Datenformat (3)
- Gesprächsanalyse (3)
- Annotation (2)
- Forschungsdaten (2)
- Langzeitarchivierung (2)
- Linguistik (2)
Publicationstate
Publisher
- ELRA (2)
- Universität Hamburg - Sonderforschungsbereich 538 (2)
- Cambridge Scholars Publ. (1)
- ELDA (1)
- Gardez!-Verl. (1)
- Hamburg (1)
- University of Hawaii Press (1)
- University of Pennsylvania - Institute for Research in Cognitive Science (1)
- de Gruyter (1)
- Österreichische Gesellschaft für Artificial Intelligence (1)
This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken
language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the systems architecture and gives an overview of its most important software components a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA.
Mit dem cGAT-Handbuch stellt das FOLK-Projekt eine Richtlinie für das computergestützte Transkribieren nach GAT 2 zur Verfügung. Das Handbuch wurde anhand der Transkriptionspraxis in FOLK entwickelt und enthält eine Vielzahl von authentischen Beispielen, die mit dem zugehörigen Audio auch über die Datenbank für Gesprochenes Deutsch (DGD) abgerufen werden können.
This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation and analysis of multimodality. Each of the tools has specific strengths so that a variety of differ-ent tools, working on the same data, can be desirable for project work. However this usually re-quires tedious conversion between formats. We propose a common exchange format for multi-modal annotation, based on the annotation graph (AG) formalism, which is supported by import and export routines in the respective tools. In the current version of this format the common de-nominator information can be reliably exchanged between the tools, and additional information can be stored in a standardized way.
This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.
Rescuing Legacy Data
(2008)
This paper discusses issues that arise in the transformation of electronic language data from outdated to modern, sustainable formats. We first describe the problem and then present four different cases in which corpora of spoken language were converted from legacy formats to an XML-based representation. For each of the four cases, we describe the conversion workflow and discuss the difficulties that we had to overcome. Based on this experience, we formulate some more general observations about transforming legacy data and conclude with a set of best practice recommendations for a more sustainable handling of language corpora.
This paper describes EXMARaLDA, a system for computer transcription of spoken discourse developed and used by the SFB "Mehrsprachigkeit" at the university of Hamburg. EXMARaLDA consists of several DTDs for XML coding of transcription data and some input and output tools for these formats. Apart from being a transcription system in its own right, EXMARaLDA also plays the role of a mediator between older existing data formats at the SFB and between these formats and a planned database of multilingual spoken discourse.
EXMARaLDA is a system for computer transcription of spoken discourse that is being developed at the SFB ‚Mehrsprachigkeit’ as a basis of a multilingual discourse database into which the transcriptions in use at the SFB will be integrated at a later point in time. The present paper describes the theoretical background of the development – a formal model of discourse transcription based on the annotation graph formalism (Bird/Liberman (2001)) – and its practical realisation in the form of an XML-based data format and several tools for input, output and manipulation of the data.
This paper describes EXMARaLDA, an XML-based framework for the construction, dissemination and analysis of corpora of spoken language transcriptions. Departing from a prototypical example of a “partitur” (musical score) transcription, the EXMARaLDA “single timeline, multiple tiers” data model and format is presented alongside with the EXMARaLDA Partitur-Editor, a tool for inputting and visualizing such data. This is followed by a discussion of the interaction of EXMARaLDA with other frameworks and tools that work with similar data models. Finally, this paper presents an extension of the “single timeline, multiple tiers” data model and describes its application within the EXMARaLDA system.