Refine
Year of publication
- 2016 (123) (remove)
Document Type
- Part of a Book (61)
- Article (33)
- Conference Proceeding (14)
- Book (13)
- Working Paper (2)
Keywords
- Deutsch (42)
- Korpus <Linguistik> (16)
- Diskursanalyse (11)
- Gesprochene Sprache (9)
- Wörterbuch (9)
- Computerunterstützte Lexikographie (8)
- Konversationsanalyse (8)
- Interaktion (7)
- Online-Wörterbuch (6)
- Soziolinguistik (6)
Publicationstate
- Veröffentlichungsversion (93)
- Zweitveröffentlichung (24)
- Postprint (6)
Reviewstate
- (Verlags)-Lektorat (123) (remove)
Publisher
"Kaum [...] da, wird' ich gedisst!" Funktionale Aspekte des Banter-Prinzips auf dem Online-Prüfstand
(2016)
The article is to be considered as an attempt to enrich the theoretical approach of the Banter-Principle (Leech 1983) with an online point of view. Examples from Teamspeak- conversations and comments on the social network site Facebook reveal different user practices regarding the identifiability of the Banter-Principle: Nonverbal elements or emoticons in order to make sure that Banter is understood correctly in written language on the one hand; coping with assigned roles depending on dynamic group internal hierarchies in oral communication on the other hand. Nevertheless one question remains. Why should one disguise a cordial message rudely? My analysis shows two functions of Online Banter. Firstly, maximize the entertainment value of a conversation and secondly, establish an accepted online-identity.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
(2016)
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.
Im Verlauf der Geschehnisse in der arabischen Welt seit 2011 gewann der Begriff Arabischer Frühling an Bedeutung und avancierte zum Leitausdruck des Diskurses. Der Beitrag geht den Fragen nach, wie der Begriff Arabischer Frühling in der deutschsprachigen Öffentlichkeit sprachlich realisiert, mit welchen sprachlichen Mitteln er konstruiert und mit welchen Ereignissen – zuweilen auch Katastrophen – er identifiziert wurde bzw. wird. Dabei wird auf die symbolische Funktion des Frühlings sowohl aus historischer Perspektive der Vormärzzeit als auch aus heutiger Sicht eingegangen. Im Blickfeld der Untersuchung stehen darüber hinaus die Jahreszeitenbezeichnungen Winter, Herbst und Sommer und ihr symbolisches Verhältnis zu den arabischen Revolutionen.
In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.