Refine
Document Type
- Article (1)
- Conference Proceeding (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- tagging (2) (remove)
Publicationstate
- Veröffentlichungsversion (2) (remove)
Reviewstate
- Peer-Review (2)
Towards a part-of-speech ontology: encoding morphemic units of two South African Bantu languages
(2012)
This article describes the design of an electronic knowledge base, namely a morpho-syntactic database structured as an ontology of linguistic categories, containing linguistic units of two related languages of the South African Bantu group: Northern Sotho and Zulu. These languages differ significantly in their surface orthographies, but are very similar on the lexical and sub-lexical levels. It is therefore our goal to describe the morphemes of these languages in a single common database in order to outline and interpret commonalities and differences in more detail. Moreover, the relational database which is developed defines the underlying morphemic units (morphs) for both languages. It will be shown that the electronic part-of-speech ontology goes hand in hand with part-of-speech tagsets that label morphemic units. This database is designed as part of a forthcoming system providing lexicographic and linguistic knowledge on the official South African Bantu languages.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.