Refine
Year of publication
Document Type
- Contribution to a Periodical (7)
- Article (5)
- Conference Proceeding (4)
- Other (1)
- Preprint (1)
Keywords
- gesprochene Sprache (18) (remove)
Publicationstate
Reviewstate
- Peer-Review (2)
Die Darstellung von und Arbeit mit Transkripten spielt in vielen forschungs- und anwendungsbezogenen Arbeiten mit Daten gesprochener Sprache eine wichtige Rolle. Der im ZuMult-Projekt entwickelte Prototyp ZuViel (Zugang zu Visualisierung von Transkripten) knüpft an etablierte Verfahren zur Transkriptdarstellung an und erweitert diese durch neue Möglichkeiten des interaktiven Arbeitens mit Transkripten im digitalen Medium. Der Beitrag führt in diese neuen Möglichkeiten ein und erklärt, wie sie in didaktischen DaF/DaZ-Kontexten aber auch hinsichtlich forschungsbezogener Perspektiven angewendet werden können
Time-based data models and the Text Encoding Initiative’s guidelines for transcription of speech
(2005)
The Database for Spoken German (Datenbank für Gesprochenes Deutsch, DGD2, http://dgd.ids-mannheim.de) is the central platform for publishing and disseminating spoken language corpora from the Archive of Spoken German (Archiv für Gesprochenes Deutsch, AGD, http://agd.ids-mannheim.de) at the Institute for the German Language in Mannheim. The corpora contained in the DGD2 come from a variety of sources, some of them in-house projects, some of them external projects. Most of the corpora were originally intended either for research into the (dialectal) variation of German or for studies in conversation analysis and related fields. The AGD has taken over the task of permanently archiving these resources and making them available for reuse to the research community. To date, the DGD2 offers access to 19 different corpora, totalling around 9000 speech events, 2500 hours of audio recordings or 8 million transcribed words. This paper gives an overview of the data made available via the DGD2, of the technical basis for its implementation, and of the most important functionalities it offers. The paper concludes with information about the users of the database and future plans for its development.
We present some recent and planned future developments in EXMARaLDA, a system for creating, managing, analysing and publishing spoken language corpora. The new functionality concerns the areas of transcription and annotation, corpus management, query mechanisms, interoperability and corpus deployment. Future work is planned in the areas of automatic annotation, standardisation and workflow management.
We give an overview of the content and the technical background of a number of corpora which were developed in various projects of the Research Centre on Multilingualism (SFB 538) between 1999 and 2011 and which are now made available to the scientific community via the Hamburg Centre for Language Corpora.
This contribution addresses the workshop topic of “standardising policies within eHumanities infrastructures”. It relates 10 years of experience with language resource standards, gained in the development of EXMARaLDA, a system for the construction and exploitation of spoken language corpora. Section 2 gives an overview of the EXMARaLDA system focussing on its relationship with existing and evolving standards for language resources. Section 3 presents the HIAT system as an example of an established community practice. Section 4 then addresses several issues that where encountered when trying to bring together HIAT, EXMARaLDA and the wider standard world.