410 Linguistik
Refine
Year of publication
Document Type
- Conference Proceeding (5)
- Book (4)
- Article (3)
- Contribution to a Periodical (2)
- Part of a Book (1)
- Review (1)
- Working Paper (1)
Has Fulltext
- yes (17)
Keywords
- Computerlinguistik (17) (remove)
This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken
language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the systems architecture and gives an overview of its most important software components a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA.
The TEI has served for many years as a mature annotation format for corpora of different types, including linguistically annotated data. Although it is based on the consensus of a large community, it does not have the legal status of a standard. During the last decade, efforts have been undertaken to develop definitive de jure standards for linguistic data that not only act as a normative basis for the exchange of language corpora but also address recent advancements in technology, such as web-based standards, and the use of large and multiply annotated corpora.
In this article we will provide an overview of the process of international standardization and discuss some of the international standards currently being developed under the auspices of ISO/TC 37, a technical committee called “Terminology and other Language and Content Resources”. After that the relationship between the TEI Guidelines and these specifications, according to their formal model, notation format, and annotation model, will be discussed. The conclusion of the paper provides recommendations for dealing with language corpora.
This paper describes EXMARaLDA, a system for computer transcription of spoken discourse developed and used by the SFB "Mehrsprachigkeit" at the university of Hamburg. EXMARaLDA consists of several DTDs for XML coding of transcription data and some input and output tools for these formats. Apart from being a transcription system in its own right, EXMARaLDA also plays the role of a mediator between older existing data formats at the SFB and between these formats and a planned database of multilingual spoken discourse.
EXMARaLDA is a system for computer transcription of spoken discourse that is being developed at the SFB ‚Mehrsprachigkeit’ as a basis of a multilingual discourse database into which the transcriptions in use at the SFB will be integrated at a later point in time. The present paper describes the theoretical background of the development – a formal model of discourse transcription based on the annotation graph formalism (Bird/Liberman (2001)) – and its practical realisation in the form of an XML-based data format and several tools for input, output and manipulation of the data.
This paper describes EXMARaLDA, an XML-based framework for the construction, dissemination and analysis of corpora of spoken language transcriptions. Departing from a prototypical example of a “partitur” (musical score) transcription, the EXMARaLDA “single timeline, multiple tiers” data model and format is presented alongside with the EXMARaLDA Partitur-Editor, a tool for inputting and visualizing such data. This is followed by a discussion of the interaction of EXMARaLDA with other frameworks and tools that work with similar data models. Finally, this paper presents an extension of the “single timeline, multiple tiers” data model and describes its application within the EXMARaLDA system.
This paper attempts a new look at computer assisted transcription as it is commonly practised within the fields of discourse analysis and language acquisition studies. The first part proposes a bridge between discourse analytical methodology and text technological methods with the concept of modelling as its central idea. The second part demonstrates the EXMARaLDA system, a set of formats and tools for computer assisted transcription that builds on the ideas developed in the first part and implements them in a way that can lead to significant improvement in current research practice.