Refine
Year of publication
- 2016 (347) (remove)
Document Type
- Part of a Book (136)
- Article (104)
- Conference Proceeding (51)
- Book (33)
- Part of Periodical (12)
- Working Paper (5)
- Doctoral Thesis (3)
- Other (2)
- Preprint (1)
Keywords
- Deutsch (113)
- Korpus <Linguistik> (47)
- Gesprochene Sprache (31)
- Konversationsanalyse (24)
- Wörterbuch (22)
- Interaktion (20)
- Computerunterstützte Lexikographie (19)
- Linguistik (17)
- Diskursanalyse (16)
- Kommunikation (15)
Publicationstate
- Veröffentlichungsversion (169)
- Zweitveröffentlichung (35)
- Postprint (17)
- Erstveröffentlichung (1)
Reviewstate
Publisher
- Institut für Deutsche Sprache (45)
- de Gruyter (34)
- De Gruyter (23)
- Winter (19)
- European Language Resources Association (ELRA) (13)
- Narr Francke Attempto (12)
- Retorika (8)
- Peter Lang (7)
- Linssen Druckcenter (6)
- Association for Computational Linguistics (5)
In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language.
This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts.
Sense relations
(2016)
In their analysis of methods that participants use to manage the realization of practical courses of action, Kendrick and Drew (2016/this issue) focus on cases of assistance, where the need to be addressed is Self’s, and Other lends a helping hand. In our commentary, we point to other forms of cooperative engagement that are ubiquitously recruited in interaction. Imperative requests characteristically expect compliance on the grounds of Other’s already established commitment to a wider and shared course of actions. Established commitments can also provide the engine behind recruitment sequences that proceed nonverbally. And forms of cooperative engagement that are well glossed as assistance can nevertheless be demonstrably oriented to established commitments. In sum, we find commitment to shared courses of action to be an important element in the design and progression of certain recruitment sequences, where the involvement of Other is best defined as contribution. The commentary highlights the importance of interdependent orientations in the organization of cooperation. Data are in German, Italian, and Polish.
Constructing a Corpus
(2016)
In diesem Beitrag liegt der Fokus auf der Vorfeldbesetzung des deutschen Satzes, insofern das Vorfeld einerseits aus einem Satzglied oder mehreren Satzgliedern und einem infiniten Teil des Verbalkomplexes oder andererseits nur aus dem infiniten Teil des Verbalkomplexes besteht. Bei diesen Formen der Vorfeldbesetzung werden Varianten und deren informationsstrukturelle Besonderheiten betrachtet. Des Weiteren soll der Frage nachgegangen werden, ob – entgegen einer haufig vorgebrachten Regel, dass das Vorfeld des deutschen Satzes nur einfach besetzt werden kann – eindeutige und auch akzeptable Belege in den Wikipedia-Korpora auffindbar sind, die darauf hinweisen, dass im Deutschen durchaus eine Vorfeldbesetzung mit mehr als einem Satzglied auftreten kann.
Researchers in Natural Language Processing rely on availability of data and software, ideally under open licenses, but little is done to actively encourage it. In fact, the current Copyright framework grants exclusive rights to authors to copy their works, make them available to the public and make derivative works (such as annotated language corpora). Moreover, in the EU databases are protected against unauthorized extraction and re-utilization of their contents. Therefore, proper public licensing plays a crucial role in providing access to research data. A public license is a license that grants certain rights not to one particular user, but to the general public (everybody). Our article presents a tool that we developed and whose purpose is to assist the user in the licensing process. As software and data should be licensed under different licenses, the tool is composed of two separate parts: Data and Software. The underlying logic as well as elements of the graphic interface are presented below.
In order to develop its full potential, global communication needs linguistic support systems such as Machine Translation (MT). In the past decade, free online MT tools have become available to the general public, and the quality of their output is increasing. However, the use of such tools may entail various legal implications, especially as far as processing of personal data is concerned. This is even more evident if we take into account that their business model is largely based on providing translation in exchange for data, which can subsequently be used to improve the translation model, but also for commercial purposes. The purpose of this paper is to examine how free online MT tools fit in the European data protection framework, harmonised by the EU Data Protection Directive. The perspectives of both the user and the MT service provider are taken into account.
There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit and other achieves good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.