Refine
Document Type
- Part of a Book (2)
- Conference Proceeding (2)
Has Fulltext
- yes (4)
Keywords
- Morphology (4) (remove)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (2)
- Peer-Review (1)
Publisher
This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).
In diesem Beitrag geht es einerseits um eine Definition dessen, was korpusgestützte Lexikographie ist, und andererseits um eine Bestandsaufnahme der gegenwärtigen Praxis korpusgestützter Lexikographie. Dabei wird ein Schwerpunkt gelegt auf allgemeinsprachige Wörterbücher der Gegenwartssprache, deren Inhalt die Beschreibung von Bedeutung und Verwendung von Lexemen ist. Außerdem liegt die Einschätzung zugrunde, dass die Auswertung elektronischer Korpora die Wörterbucharbeit weitgehend positiv beeinflusst und verändert, vorausgesetzt, dass zugrunde gelegte Korpus wurde für das geplante Wörterbuch so gut wie möglich in Umfang und Zusammensetzung eingerichtet.
Optimality theory (henceforth OT) models natural language competence in terms of interactions of universal constraints, notably markedness and faithfulness constraints. This article illustrates some of the major advances in the understanding of word-formation phenomena originating from this theory, including the prosodic organization of morphologically complex words, neutralization patterns in derivational affixes, allomorphy, and infixation.