Refine
Year of publication
Document Type
- Part of a Book (37)
- Conference Proceeding (31)
- Article (26)
- Contribution to a Periodical (9)
- Working Paper (9)
- Book (3)
- Other (3)
- Preprint (2)
Keywords
- Korpus <Linguistik> (79)
- Gesprochene Sprache (62)
- Transkription (32)
- Deutsch (28)
- Computerlinguistik (24)
- gesprochene Sprache (18)
- Annotation (12)
- Konversationsanalyse (11)
- Datenbank (9)
- Standardisierung (8)
Publicationstate
- Veröffentlichungsversion (38)
- Zweitveröffentlichung (20)
- Postprint (9)
- Erstveröffentlichung (1)
Reviewstate
- Peer-Review (32)
- (Verlags)-Lektorat (24)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
- de Gruyter (10)
- Institut für Deutsche Sprache (7)
- European Language Resources Association (ELRA) (6)
- European Language Resources Association (5)
- Leibniz-Institut für Deutsche Sprache (IDS) (5)
- Narr (4)
- Verlag für Gesprächsforschung (4)
- Cambridge Scholars Publ. (3)
- Linköping University Electronic Press (3)
- Springer (3)
This paper describes EXMARaLDA, a system for computer transcription of spoken discourse developed and used by the SFB "Mehrsprachigkeit" at the university of Hamburg. EXMARaLDA consists of several DTDs for XML coding of transcription data and some input and output tools for these formats. Apart from being a transcription system in its own right, EXMARaLDA also plays the role of a mediator between older existing data formats at the SFB and between these formats and a planned database of multilingual spoken discourse.
EXMARaLDA is a system for computer transcription of spoken discourse that is being developed at the SFB ‚Mehrsprachigkeit’ as a basis of a multilingual discourse database into which the transcriptions in use at the SFB will be integrated at a later point in time. The present paper describes the theoretical background of the development – a formal model of discourse transcription based on the annotation graph formalism (Bird/Liberman (2001)) – and its practical realisation in the form of an XML-based data format and several tools for input, output and manipulation of the data.
Der Einsatz des Computers zur Transkription natürlicher Gespräche ist in der Praxis zwar weit verbreitet, die schnelle Weiterentwicklung der Computertechnologie hat aber dazu geführt, dass verschiedene Systeme oft scheinbar zusammenhangslos nebeneinander stehen, ohne dass ihre Gemeinsamkeiten und Unterschiede Gegenstand einer umfassenden theoretischen Betrachtung wären. Der vorliegende Aufsatz will einen Beitrag zu einer solchen theoretischen Betrachtung der Gesprächstranskription auf dem Computer liefern, indem er einerseits einige grundlegende Überlegungen zu diesem Thema anstellt, andererseits einige allgemeine Aspekte der Konzeption und Umsetzung des Systems EXMARaLDA, das am SFB "Mehrsprachigkeit" an der Universität Harnburg entwickelt wird, beschreibt.
Stellungnahme zu Wolfgang Schneiders Artikel "Annotate in Transkriptionen aus DV-technischer Sicht"
(2002)
We define collaborative commentary as the involvement of a research community in the interpretive annotation of electronic records. The goal of this process is the evaluation of competing theoretical claims. The process requires commentators to link their comments and related evidentiary materials to specific segments of either transcripts or electronic media. Here, we examine current work in the construction of technical methods for facilitating collaborative commentary through browser technology. To illustrate the relevance of this approach, we examine seven spoken language database projects that have reached a level of web-based publication that makes them good candidates as targets of collaborative commentary technology. For each database, we show how collaborative commentary can advance the relevant research agendas.
This paper describes EXMARaLDA, an XML-based framework for the construction, dissemination and analysis of corpora of spoken language transcriptions. Departing from a prototypical example of a “partitur” (musical score) transcription, the EXMARaLDA “single timeline, multiple tiers” data model and format is presented alongside with the EXMARaLDA Partitur-Editor, a tool for inputting and visualizing such data. This is followed by a discussion of the interaction of EXMARaLDA with other frameworks and tools that work with similar data models. Finally, this paper presents an extension of the “single timeline, multiple tiers” data model and describes its application within the EXMARaLDA system.
This paper attempts a new look at computer assisted transcription as it is commonly practised within the fields of discourse analysis and language acquisition studies. The first part proposes a bridge between discourse analytical methodology and text technological methods with the concept of modelling as its central idea. The second part demonstrates the EXMARaLDA system, a set of formats and tools for computer assisted transcription that builds on the ideas developed in the first part and implements them in a way that can lead to significant improvement in current research practice.
Dieser Aufsatz befasst sich mit Fragen, die sich im Zusammenhang mit der Archivierung und öffentlichen Bereitstellungen von gesprächsanalytischen Daten (Audio- bzw. Videoaufnahmen und deren Transkriptionen) stellen. Er gibt zunächst einen Überblick über die Forschungsperspektiven, die eine verbesserte Praxis der Datenm•chivierung flir die Gesprächsforschung bieten würde, und nennt dann einige der wesentlichen Probleme, die in der derzeitigen Praxis der Schaffung solcher Archive im Wege stehen können. Anschließend werden vorhandene Lösungsansätze vorgestellt, die helfen können, diese Probleme zu überwinden.
Dieser Aufsatz gibt einen Überblick über EXMARaLDA, ein System aus Datenmodell, Datenformaten und Software-Werkzeugen zum computergestützten Erstellen und Analysieren von Kmpora gesprochener Sprache. Der Schwerpunkt der Darstellung liegt auf der Nutzung der verschiedenen Softwarewerkzeuge- ein Patiitur-Editor zum Erstellen von Transkriptionen, ein Corpus-Manager zum Erstellen und Verwalten von Korpora und ein Suchwerkzeug zum Auswerten solcher Korpora- für gesprächsanalytische Zwecke.
Time-based data models and the Text Encoding Initiative’s guidelines for transcription of speech
(2005)
In diesem Aufsatz geht es um die Datenbank ‚Mehrsprachigkeit’ und das System EXMARaLDA, die am SFB 538 ‚Mehrsprachigkeit’ der Universität Hamburg entwickelt werden. Da deren konzeptuelle und technische Details bereits an anderer Stelle ausführlich dargestellt worden sind (z.B. Schmidt 2004), soll der Schwerpunkt hier einerseits auf solchen Aspekten liegen, die – gemäß dem Thema des Workshops – mit allgemeineren Fragen zum Umgang mit computerverwertbaren, heterogenen linguistischen Datenbeständen zu tun haben. Andererseits soll versucht werden, aus den praktischen Erfahrungen der nunmehr vierjährigen Projektarbeit einige Erkenntnisse abzuleiten, die über den konkreten Projektzusammenhang hinaus für die weitere Arbeit auf diesem Gebiet interessant sein könnten.
This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.
This paper presents ongoing work on a multilingual (English, French, German) lexical resource of soccer language. The first part describes how lexicographic descriptions based on frame-semantic principles are derived from a partially aligned multilingual corpus of soccer match reports. The remainder of the paper then discusses how different types of ontological knowledge are linked to this resource in order to provide an access structure to the resulting dictionary. It is argued that linking lexical resources and ontologies in such a way provides novel ways to a dictionary user of navigating a domain vocabulary
This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual research projects altogether. These projects are involved in creating manifold language resources, especially corpora, tailored to their particular needs. The aim of the project described here is to ensure an effective and sustainable access of these data by third-party researchers beyond the termination of these projects. This goal involves a number of measures, such as the definition of a common data format to completely capture the heterogeneous information encoded in the individual corpora, the development of user-friendly and sustainably usable tools for processing (e.g. querying) the data, and the specification of common inventories of metadata and terminology. Moreover, the project aims at formulating general rules of best practice for creating, accessing, and archiving linguistic resources.
This paper presents the Kicktionary, a multilingual (English — German - French) electronic lexical resource of the language of football. It explains how a corpus of football match reports was analysed according to the FrameNet and WordNet approaches and how the result of this analysis is presented to a dictionary user via a website