Refine
Document Type
- Conference Proceeding (2)
- Part of a Book (1)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Text Technology (3) (remove)
Publicationstate
Reviewstate
Publisher
In the mid-1990s, the Faculty of Linguistics and Literary-Studies at Bielefeld University began to establish the field Text technology, both in research and education. Text technology is a new field of research on the border of Computational Linguistics and Computational Philology.
This paper focuses on Text technology in academic education. In 2002, Text Technology was introduced as a minor subject for B.A. Programs. It is organized in modules: Module 1 introduces the characteristics of electronic texts and documents, typography, typesetting systems and hypertext. Module 2 introduces one or two programming languages relevant to the field of humanities computing. Markup languages and the principles of information structuring are the main topics of Module 3. The formal fundamentals of computer-based text processing, as formal languages and their grammars, Logics et cetera are subjects of another module. The paper ends with a short description of other Bachelor- and Master-Programs at Bielefeld University which contain text technological themes.
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).