Volltext-Downloads (blau) und Frontdoor-Views (grau)

Enhancing speech corpus resources with multiple lexical tag layers

  • We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).

Export metadata

Additional Services

Share in Twitter Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Andreas WittORCiDGND, Harald LüngenGND, Dafydd Gibbon
URN:urn:nbn:de:bsz:mh39-45517
URL:http://lrec-conf.org/proceedings/lrec2000/
Parent Title (English):Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000). Athen, Griechenland
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Document Type:Conference Proceeding
Language:English
Year of first Publication:2000
Date of Publication (online):2016/01/11
Publicationstate:Veröffentlichungsversion
Reviewstate:(Verlags)-Lektorat
Tag:DSSSL; Morphology; Speech Corpora; Speech Lexica; Text Technology; XML
Pagenumber:5
Dewey Decimal Classification:400 Sprache / 410 Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG