Refine
Year of publication
Document Type
- Article (98) (remove)
Has Fulltext
- yes (98) (remove)
Keywords
- Gesprochene Sprache (98) (remove)
Publicationstate
- Veröffentlichungsversion (48)
- Zweitveröffentlichung (21)
- Postprint (6)
Reviewstate
- Peer-Review (50)
- (Verlags)-Lektorat (21)
- Peer-review (1)
Publisher
- Institut für Deutsche Sprache (14)
- Verlag für Gesprächsforschung (14)
- de Gruyter (8)
- Erich Schmidt (5)
- Leibniz-Institut für Deutsche Sprache (IDS) (3)
- Benjamins (2)
- Buske (2)
- Kossuth/Nodus (2)
- Vandenhoeck & Ruprecht (2)
- Amsterdam [u.a.] (1)
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
Jubel über Bum Bum Becker
(1985)
In recent decades, the investigation of spoken language has become increasingly important in linguistic research. However, the spoken word is a fleeting phenomenon which is difficult to analyse and which requires an elaborate process of examination and appraisal. The Institute for the German Language (Institut für Deutsche Sprache) has the largest collection of recordings of spoken German, the German Speech Archive (Deutsches Spracharchiv [DSAv]). Up to now, the inadequate processing and accessibility of the valuable material held by the DSAv has been regarded as its major shortcoming. A solution to this problem is at hand now that a start has been made with the systematic modernization of the DSAv and, in particular, with the digitalization of its material. In recent years, we have been able to systematically exploit the unique opportunities provided by a new and easier form of access to the spoken language via the recorded sound signal, which can be realized digitally in the computer, and its linkage to the corresponding texts and documentary data. Through the integration of the existing data about the corpora and of the written versions of the texts into an information and full text database and through the linking of these data with the acoustic signal itself, it is now possible for us to construct a data pool which allows a better documentation of the material and provides rapid internal and external access to the sound recordings. Processed in such a way, the material of the German Speech Archive can now be regarded as having been saved for posterity. As a result, entirely new areas of inquiry and entirely new research perspectives have been opened up. This is true both for the work of the Institute itself and for linguistic research in German as a whole.