Refine
Document Type
- Conference Proceeding (2)
- Article (1)
- Part of a Book (1)
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
- Gesprochene Sprache (4)
- Korpus <Linguistik> (4)
- Annotation (2)
- Jugendsprache (2)
- Kiezdeutsch (2)
- Automatische Sprachanalyse (1)
- Automatische Sprachverarbeitung (1)
- Interoperabilität (1)
- Multikulturelle Gesellschaft (1)
- Soziolekt (1)
Publicationstate
Reviewstate
- (Verlags)-Lektorat (3)
- Peer-Review (1)
Publisher
This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.
This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.
Annotating Spoken Language
(2014)