Refine
Document Type
- Conference Proceeding (2)
- Working Paper (1)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Gesprochene Sprache (2)
- DSSSL (1)
- Korpus <Linguistik> (1)
- Maschinelle Übersetzung (1)
- Morphology (1)
- SGML (1)
- Speech Corpora (1)
- Speech Lexica (1)
- Text Encoding Initiative (1)
- Text Technology (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (2)
- Verlagslektorat (1)
Publisher
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).