Part-of-Speech tagging of Northern Sotho: Disambiguating polysemous function words
- A major obstacle to part-of-speech (=POS) tagging of Northern Sotho (Bantu, S 32) are ambiguous function words. Many are highly polysemous and very frequent in texts, and their local context is not always distinctive. With certain taggers, this issue leads to comparatively poor results (between 88 and 92 % accuracy), especially when sizeable tagsets (over 100 tags) are used. We use the RF-tagger (Schmid and Laws,2008), which is particularly designed for the annotation of fine-grained tagsets (e.g. including agreement information), and we restructure the 141 tags of the tagset proposed by Taljard et al. (2008) in a way to fit the RF tagger. This leads to over 94 % accuracy. Error analysis in addition shows which types of phenomena cause trouble in the POS-tagging of Northern Sotho.
Author: | Gertrud FaaßORCiD, Ulrich HeidORCiDGND, Elsabe TaljardORCiD, Danie Prinsloo |
---|---|
URN: | urn:nbn:de:bsz:mh39-118813 |
URL: | https://aclanthology.org/volumes/W09-07/ |
ISBN: | 1-932432-25-6 |
Parent Title (English): | Proceedings of the First Workshop on Language Technologies for African Languages |
Publisher: | Association for Computational Linguistics |
Place of publication: | Stroudsburg |
Editor: | Guy De Pauw, Gilles-Maurice de Schryver, Lori Levin |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2009 |
Date of Publication (online): | 2023/06/01 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
GND Keyword: | Bantusprachen; Funktionswort; Methodologie; Nordsotho; Polysemie |
First Page: | 38 |
Last Page: | 45 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | Creative Commons - Namensnennung 4.0 International |