TY - JOUR U1 - Wissenschaftlicher Artikel A1 - Heid, Ulrich A1 - Prinsloo, Danie J. A1 - Faaß, Gertrud A1 - Taljard, Elsabé T1 - Designing a noun guesser for part of speech tagging in Northern Sotho JF - South African Journal of African Languages N2 - In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that is described, identifies (and classifies) noun forms. Several types of linguistic knowledge are used to recognize nouns that are not contained in the noun lexicon of the system. These include the relationship between singular and plural noun prefixes, knowledge about noun derivation, and data about the co-occurrence of the candidate with concords, pronouns and adjectives in a local context. Our implementation is a symbolic, voting-based process: together, all tests determine whether a candidate is a noun; accuracy on unseen test data is around 92%. KW - Pedi-Sprache KW - Computerlinguistik KW - Wortart KW - Substantiv KW - Datenanalyse KW - noun guesser KW - part of speech KW - part of speech tagging KW - data extraction Y1 - 2009 UN - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-119242 SN - 2305-1159 SS - 2305-1159 U6 - https://doi.org/10.1080/02572117.2009.10587313 DO - https://doi.org/10.1080/02572117.2009.10587313 N1 - This is an Accepted Manuscript of an article published by Taylor & Francis in South African Journal of African Languages online on 24 Oct 2012, available at: https://doi.org/10.1080/02572117.2009.10587313. VL - 29 IS - 1 SP - 1 EP - 19 PB - Taylor & Francis CY - London ER -