Volltext-Downloads (blau) und Frontdoor-Views (grau)

Designing a noun guesser for part of speech tagging in Northern Sotho

  • In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that is described, identifies (and classifies) noun forms. Several types of linguistic knowledge are used to recognize nouns that are not contained in the noun lexicon of the system. These include the relationship between singular and plural noun prefixes, knowledge about noun derivation, and data about the co-occurrence of the candidate with concords, pronouns and adjectives in a local context. Our implementation is a symbolic, voting-based process: together, all tests determine whether a candidate is a noun; accuracy on unseen test data is around 92%.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Ulrich HeidORCiDGND, Danie J. Prinsloo, Gertrud FaaßORCiD, Elsabé TaljardORCiD
URN:urn:nbn:de:bsz:mh39-119242
DOI:https://doi.org/10.1080/02572117.2009.10587313
ISSN:2305-1159
Parent Title (English):South African Journal of African Languages
Publisher:Taylor & Francis
Place of publication:London
Document Type:Article
Language:English
Year of first Publication:2009
Date of Publication (online):2023/06/09
Publishing Institution:Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung]
Publicationstate:Zweitveröffentlichung
Publicationstate:Postprint
Reviewstate:Peer-Review
Tag:data extraction; noun guesser; part of speech; part of speech tagging
GND Keyword:Computerlinguistik; Datenanalyse; Pedi-Sprache; Substantiv; Wortart
Volume:29
Issue:1
First Page:1
Last Page:19
Note:
This is an Accepted Manuscript of an article published by Taylor & Francis in South African Journal of African Languages online on 24 Oct 2012, available at: https://doi.org/10.1080/02572117.2009.10587313.
DDC classes:400 Sprache / 400 Sprache, Linguistik
Open Access?:ja
Linguistics-Classification:Computerlinguistik
Licence (German):License LogoUrheberrechtlich geschützt