Refine
Year of publication
- 2014 (25) (remove)
Document Type
- Conference Proceeding (12)
- Part of a Book (8)
- Article (4)
- Other (1)
Is part of the Bibliography
- no (25) (remove)
Keywords
- Korpus <Linguistik> (25) (remove)
Publicationstate
- Veröffentlichungsversion (15)
- Postprint (1)
Reviewstate
- (Verlags)-Lektorat (9)
- Peer-Review (4)
Publisher
This paper presents the first release of the KiezDeutsch Korpus (KiDKo), a new language resource with multiparty spoken dialogues of Kiezdeutsch, a newly emerging language variety spoken by adolescents from multi-ethnic urban areas in Germany. The first release of the corpus includes the transcriptions of the data as well as a normalisation layer and part-of-speech annotations. In the paper, we describe the main features of the new resource and then focus on automatic POS tagging of informal spoken language. Our tagger achieves an accuracy of nearly 97% on KiDKo. While we did not succeed in further improving the tagger using ensemble tagging, we present our approach to using the tagger ensembles for identifying error patterns in the automatically tagged data.
The annotation of parts of speech (POS) in linguistically annotated corpora is a fundamental annotation layer which provides the basis for further syntactic analyses, and many NLP tools rely on POS information as input. However, most POS annotation schemes have been developed with written (newspaper) text in mind and thus do not carry over well to text from other domains and genres. Recent discussions have concentrated on the shortcomings of present POS annotation schemes with regard to their applicability to data from domains other than newspaper text.
Recent work on error detection has shown that the quality of manually annotated corpora can be substantially improved by applying consistency checks to the data and automatically identifying incorrectly labelled instances. These methods, however, can not be used for automatically annotated corpora where errors are systematic and cannot easily be identified by looking at the variance in the data. This paper targets the detection of POS errors in automatically annotated corpora, so-called silver standards, showing that by combining different measures sensitive to annotation quality we can identify a large part of the errors and obtain a substantial increase in accuracy.
This paper analyses paramedic emergency interaction as multimodal multiactivity. Based on a corpus of video-recordings of emergency drills performed by professional paramedics during advanced training, the focus is on paramedics’ participation in multiple joint projects which become simultaneously relevant. Simultaneity and fast succession of multiactivity does not only characterise work on the team level, but also the work profile of the individual paramedic. Participants have to coordinate their own participation in more than one joint project intrapersonally. In the data studied, three patterns of allocating multimodal resources stood out as routine ways of coordinating participation in two simultaneous projects intrapersonally:
1. Talk and hearing vs. manual action monitored by gaze,
2. Talk and hearing vs. gazing (and pointing),
3. Manual action vs. gaze (and talk and hearing).
In diesem Beitrag werden zentrale methodische Fragen der Erstellung mündlicher Sprachkorpora anhand des Mannheimer FOLK-Korpus diskutiert, teils im Hinblick auf gesprochensprachliche Korpora insgesamt, teil im Vergleich zum Leipziger GeWiss-Korpus. Bei FOLK steht keine bestimmte thematisch-institutionelle Domäne im Mittelpunkt des Korpusaufbaus, sondern das Ziel, ein ausgewogenes Korpus authentischer Gespräche unterschiedlicher Sprecher/innen in Alltag, Institutionen und Medien für eine Vielzahl von Forschungsfragen und Verwendungskontexten bereitzustellen. Der Artikel stellt das Vorgehen bei der Korpus-Akquise, die Anlage der Metadaten, den Workflow des Projekts sowie die Transkriptionskonventionen und die orthografische Normalisierung der Transkriptionen ausführlich vor und beschreibt Korpusaufbau und -stratifikation sowie die Einbindung von FOLK in die Datenbank für Gesprochenes Deutsch 2.0 des IDS.
Lexikonstatistik 2.0
(2014)
In der Mitte des 20. Jahrhunderts gab es diverse Versuche, die Klassifikation von Sprachen mit Hilfe von Wortlisten, die dem Grundvokabular der betreffenden Sprachen entnommen sind, zu automatisieren. Diese Methoden wurden und werden in der historischen Sprachwissenschaft gemeinhin kritisch diskutiert, da sich die erzielten Ergebnisse häufig als fehlerhaft erwiesen.
In den letzten Jahren erleben wir einen neuen Aufschwung lexikostatistischer und glottochronologischer Ansätze. Deren Erfolgsaussichten sind heute wesentlich besser als vor einem halben Jahrhundert, da uns jetzt große Mengen an sprachvergleichenden Daten in elektronischer Form zur Verfügung stehen und die Computerlinguistik und Bioinformatik mächtige Werkzeuge bereitstellt, diese Daten statistisch auszuwerten.
Im vorliegenden Artikel wird eine Fallstudie vorgestellt, die das Potenzial lexikostatistischer Methoden im 21. Jahrhundert illustriert.
This chapter focuses on the way in which co-present parties in meetings manage language choice and treat it as raising problems of participation - in the sense that participants can orient to the fact that a given language choice may increase or diminish participation for some or all co-present group members. Choosing one language rather than another is approached here as a members' problem (in an ethnomethodological sense), and as a decision the participants make themselves, in situ and within their courses of action, displaying the way in which they orient to its local consequences, and how they justify and legitimize it. In order to explore this link between language choice and participation systematically, in this chapter we focus on a particular and recurrent phenomenon, the announcement of a language change. Within the conversation analysis framework, we analyse these announcements by taking into account the sequential position in which they occur, their format, the way in which they are addressed to a sub-group or to the group as a whole, and the specific action they accomplish. We will also look at how the group receives the announcement, its effects on the participation framework, as well as the categorizations that ensue from it. This chapter therefore highlights the mutual configuration between language choice and participation framework. Our analyses are based on several video- and audio-recorded corpora of international work meetings. These video data call for reflection not only on the linguistic dimension of participation frameworks and language switches, but more broadly on their multimodal organization. This chapter shows that multimodal details are crucial if we aim to understand the relation between multilingualism and participation as occasioned, contingent and emergent dynamics.
Accurate opinion mining requires the exact identification of the source and target of an opinion. To evaluate diverse tools, the research community relies on the existence of a gold standard corpus covering this need. Since such a corpus is currently not available for German, the Interest Group on German Sentiment Analysis decided to create such a resource and make it available to the research community in the context of a shared task. In this paper, we describe the selection of textual sources, development of annotation guidelines, and first evaluation results in the creation of a gold standard corpus for the German language.
Dieser Beitrag stellt das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) und die Datenbank für Gesprochenes Deutsch (DGD) als Instrumente gesprächsanalytischer Arbeit vor. Nach einer allgemeinen Einführung in FOLK und DGD im zweiten Abschnitt werden im dritten Abschnitt die methodischen Beziehungen zwischen Korpuslinguistik und Gesprächsforschung und die Herausforde-rungen, die sich bei der Begegnung dieser beiden Herangehensweisen an authenti-sches Sprachmaterial stellen, kurz skizziert. Der vierte Abschnitt illustriert dann ausgehend vom Beispiel der Formel ich sag mal, wie eine korpus- und datenbankgesteuerte Analyse zur Untersuchung von Gesprächsphänomenen beitragen kann.