Korpuslinguistik
Refine
Year of publication
Document Type
- Conference Proceeding (127)
- Part of a Book (105)
- Article (51)
- Book (16)
- Other (6)
- Working Paper (5)
- Doctoral Thesis (3)
- Review (3)
- Preprint (2)
- Report (2)
Keywords
- Korpus <Linguistik> (284)
- Deutsch (91)
- Gesprochene Sprache (42)
- Annotation (38)
- Forschungsdaten (24)
- Computerlinguistik (22)
- corpus linguistics (22)
- Datenmanagement (20)
- Grammatik (16)
- Corpus linguistics (15)
Publicationstate
- Veröffentlichungsversion (322) (remove)
Reviewstate
Publisher
- Institut für Deutsche Sprache (35)
- de Gruyter (29)
- European Language Resources Association (ELRA) (19)
- Leibniz-Institut für Deutsche Sprache (IDS) (17)
- Leibniz-Institut für Deutsche Sprache (11)
- Narr (11)
- European Language Resources Association (10)
- Linköping University Electronic Press (10)
- CLARIN (8)
- IDS-Verlag (7)
Plea for a modern corpus-based German lexicography
There is an eminent research tradition within German lexicography; Grimm’s dictionary, the most impressive achievement of this scholarly work, was soon to become the model of many similar enterprises. But not only is it largely outdated by now (most entries are based on work of the 19th century): there is generally an increasing gap in German lexicographical research between what is needed and possible, on the one hand, and what is actually achieved, on the other. Several reasons for this unsatisfactory situation are discussed; the most important among these is probably that the actual practice of all larger enterprises in this field is still dominated by methods of the 19th century. The new edition of Grimm’s dictionary, which was started in the Fifties, will probably never be completed, if continued as at present. The only way to overcome this unsatisfactory situation and to approach the standards reached in other countries would be a comprehensive corpus-based lexical enterprise with highly flexible task-specific software tools.
Dieser Beitrag nimmt Bezug auf ein lexikologisches Arbeitsprojekt des Instituts für deutsche Sprache (Mannheim) und will einen Einblick in die Voraussetzungen und Ziele dieses Vorhabens sowie in die Arbeitsweise der Projektmitarbeiter geben. Dabei soll Aspekten der Korpus- und Computernutzung in den einzelnen Arbeitsetappen besondere Aufmerksamkeit gelten.
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).