Refine
Year of publication
- 2010 (1)
Document Type
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- no (1)
Keywords
- Daten (1)
- Dzongkha (1)
- Korpus <Linguistik> (1)
- Sprachverarbeitung (1)
- Text-to-Speech (1)
Publicationstate
Reviewstate
- Peer-Review (1)
Publisher
This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model. Using the lexicon extracted from the training corpus and lexicon from the available word list, we used two statistical taggers for comparison reasons. The best result achieved was 93.1% accuracy in a 10-fold cross validation on the training set. The winning tagger was thereafter applied to annotate a 570,247 token corpus.