Refine
Year of publication
- 2016 (169) (remove)
Document Type
- Article (59)
- Conference Proceeding (44)
- Part of a Book (43)
- Book (15)
- Working Paper (5)
- Doctoral Thesis (3)
Keywords
- Deutsch (64)
- Korpus <Linguistik> (31)
- Gesprochene Sprache (20)
- Konversationsanalyse (11)
- Wörterbuch (11)
- Computerunterstützte Lexikographie (10)
- Französisch (8)
- German (7)
- Computerlinguistik (6)
- Linguistik (6)
Publicationstate
- Veröffentlichungsversion (169) (remove)
Reviewstate
Publisher
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit and other achieves good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
This study investigates high vowel laxing in the Louisiana French of the Lafourche Basin. Unlike Canadian French, in which the high vowels /i, y, u/ are traditionally described as undergoing laxing (to [I, Y, U]) in word-final syllables closed by any consonant other than a voiced fricative (see Poliquin 2006), Oukada (1977) states that in the Louisiana French of Lafourche Parish, any coda consonant will trigger high vowel laxing of /i/; he excludes both /y/ and /u/ from his discussion of high vowel laxing. The current study analyzes tokens of /i, y, u/ from pre-recorded interviews with three older male speakers from Terrebonne Parish. We measured the first and second formants and duration for high vowel tokens produced in four phonetic environments, crossing syllable type (open vs. closed) by consonant type (voiced fricative vs. any consonant other than a voiced fricative). Results of the acoustic analysis show optional laxing for /i/ and /y/ and corroborate the finding that high vowels undergo laxing in word-final closed syllables, regardless of consonant type. Data for /u/ show that the results vary widely by speaker, with the dominant pattern (shown by two out of three speakers) that of lowering and backing in the vowel space of closed syllable tokens. Duration data prove inconclusive, likely due to the effects of stress. The formant data published here constitute the first acoustic description of high vowels for any variety of Louisiana French and lay the groundwork for future study on these endangered varieties.
This thesis consists of the following three papers that all have been published in international peer-reviewed journals:
Chapter 3: Koplenig, Alexander (2015c). The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv037]
Chapter 4: Koplenig, Alexander (2015b). Why the quantitative analysis of dia-chronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv030]
Chapter 5: Koplenig, Alexander (2015a). Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis. Published in: Corpus Linguistics and Linguistic Theory. Berlin/Boston: de Gruyter. [doi:10.1515/cllt-2014-0049]
Chapter 1 introduces the topic by describing and discussing several basic concepts relevant to the statistical analysis of corpus linguistic data. Chapter 2 presents a method to analyze diachronic corpus data and a summary of the three publications. Chapters 3 to 5 each represent one of the three publications. All papers are printed in this thesis with the permission of the publishers.
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
(2016)
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.
Im Verlauf der Geschehnisse in der arabischen Welt seit 2011 gewann der Begriff Arabischer Frühling an Bedeutung und avancierte zum Leitausdruck des Diskurses. Der Beitrag geht den Fragen nach, wie der Begriff Arabischer Frühling in der deutschsprachigen Öffentlichkeit sprachlich realisiert, mit welchen sprachlichen Mitteln er konstruiert und mit welchen Ereignissen – zuweilen auch Katastrophen – er identifiziert wurde bzw. wird. Dabei wird auf die symbolische Funktion des Frühlings sowohl aus historischer Perspektive der Vormärzzeit als auch aus heutiger Sicht eingegangen. Im Blickfeld der Untersuchung stehen darüber hinaus die Jahreszeitenbezeichnungen Winter, Herbst und Sommer und ihr symbolisches Verhältnis zu den arabischen Revolutionen.
In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.
Smiling individuals are usually perceived more favorably than non-smiling ones—they are judged as happier, more attractive, competent, and friendly. These seemingly clear and obvious consequences of smiling are assumed to be culturally universal, however most of the psychological research is carried out in WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic) and the influence of culture on social perception of nonverbal behavior is still understudied. Here we show that a smiling individual may be judged as less intelligent than the same non-smiling individual in cultures low on the GLOBE’s uncertainty avoidance dimension. Furthermore, we show that corruption at the societal level may undermine the prosocial perception of smiling—in societies with high corruption indicators, trust toward smiling individuals is reduced. This research fosters understanding of the cultural framework surrounding nonverbal communication processes and reveals that in some cultures smiling may lead to negative attributions.
Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim
(2016)
Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim
(2016)
Bild-Makros, auch unter dem Begriff Memes bekannt, sind populäre Internetphänomene, die im Zuge der umfassenden Multimodalisierung der Medienkommunikation als Unterhaltungsangebote auf Facebook verbreitet und kommentiert werden. Dieser Beitrag betrachtet diese aus einer Kombination von Bild und Text bestehenden multimodalen Kommunikate aus einer gattungs- und gesprächsanalytischen Perspektive, da Bild- Makros sowohl in ihrer formalen und semantischen Gestaltung als auch in der interaktiven Rezeption in Form von Kommentaren und Antworten verfestigte Muster aufzuweisen scheinen. In dieser medial vermittelten Interaktion haben sich sowohl auf der strukturellen Ebene der Interaktionssequenzen als auch innerhalb einzelner, auf sequenzexterner und sequenzinterner Ebene analysierten Interaktionseinheiten verschiedene kommunikative Muster herausgebildet. Darin nehmen soziale Prozesse wie face-work und Identitätskonstruktion Einfluss auf die interaktive Aushandlung des Kommunikats.