Refine
Year of publication
- 2019 (13) (remove)
Document Type
- Article (6)
- Conference Proceeding (5)
- Review (2)
Has Fulltext
- yes (13)
Is part of the Bibliography
- no (13) (remove)
Keywords
- Korpus <Linguistik> (5)
- corpus linguistics (5)
- corpus processing (4)
- web corpora (3)
- Diskursanalyse (2)
- corpus management (2)
- Anonymität (1)
- Beiläufiges Schreiben (1)
- Deutsch (1)
- Digitalisierung (1)
Publicationstate
Reviewstate
- Peer-Review (10)
- (Verlags)-Lektorat (3)
Publisher
- Leibniz-Institut für Deutsche Sprache (13) (remove)
Transdisciplinary research is research not only on, but also for and, most of all, with practitioners. In the research framework of transdisciplinarity, scholars and practitioners collaborate throughout research projects with the aim of mutual learning. This paper shows the value transdisciplinarity can add to media linguistics. It does so by investigating the digital literacy shift in journalism: the change, in the last two decades, from the predominance of a writing mode that we have termed focused writing to a mode we have called writing-by-the-way. Large corpora of writing process data have been generated and analyzed with the multimethod approach of progression analysis in order to combine analytical depth with breadth. On the object level of doing writing in journalism, results show that the general trend towards writing-by-the-way opens up new niches for focused writing. On a meta level of doing research, findings explain under what conditions transdisciplinarity allows for deeper insights into the medialinguistic object of investigation.
This contribution aims to describe privacy, publicness and anonymity as essential analytic dimensions for media linguistic research. The dimensions are not inherent in and predetermined by the technical features and forms of communication provided by mobile devices, but are used by the participants as an orientation grid for shaping their online and offline practices in and with mobile media. Consid-ering both mobile device use in the public realm and the dissemina-tion of increasingly private content in social media (which is said to lead to ‘blurred boundaries’ between the private and the public), the paper provides a brief overview of the main developments in mobile media research: Studies adopting various approaches – e. g. socio-logical-ethnographic, linguistic and media studies – illustrate how publicness, privacy and anonymity are actively shaped and brought about by mobile media users in face-to-face and remote social en-counters. As this shows that publicness, privacy and anonymity are still relevant concepts for users, future media linguistics studies should focus on the dynamic multimodal practices by which they are contextualized and accomplished.
Narratives 2.0. A Multi-dimensional approach to semi-public storytelling in WhatsApp voice messages
(2019)
Based on a corpus of voice message narratives in German WhatsApp group chats, the present study contributes to research on social media storytelling in that it focusses on stories of personal experience which are embedded in a communication platform which favours a continuous dialogic exchange, narrated to well-defined non-anonymous publics and multimodal (comprised of visual and audible posting types). To capture the characteristics of this type of social media storytelling, the paper argues that Ochs and Capps’ (2001) dimensional model originally developed for conversational narratives (including the dimensions of tellability, tellership, embeddedness, linearity, moral stance) should be expanded by the dimensions of publicness, multimodality and sequencing. The prototype of storytelling in WhatsApp group chats is based on recent personal experiences; it is related by a single teller as an initial, sequentially non-embedded and linearly organised “big package” story (in a single voice message sometimes introduced by a text message containing an abstract); other group members routinely document their evaluative stances in rather conventionalised text message responses in the semi-public group space.
Intergroup conflict im Sprachgebrauch rechtspopulistischer Gruppierungen am Beispiel von "Pegida"
(2019)
Populismus spaltet Gesellschaften – so lautet eine häufig zu hörende und zu lesende Auffassung. Als offensichtlichste Form der Spaltung erscheint dabei die gruppenbezogene Spaltung zwischen denjenigen, die populistischen Bewegungen und Parteien anhängen und denjenigen, die das mehr oder weniger entschieden nicht tun. Die Risse in der Gesellschaft zeigen sich jedoch nicht nur in Bezug auf diesen Gruppenkonflikt. Er ist nur eine Linie in einem Netz von tatsächlichen oder auch nur wahrgenommenen und rhetorisch konstruierten Frakturen, die von populistischen Gruppierungen hervorgehoben oder möglicherweise auch erst geschaffen werden und Eingang in den öffentlichen Diskurs finden.
Muskelversagen? Großartig! - Framing von Fachbegriffen aufgrund unterschiedlichen Weltwissens
(2019)
Text corpora come in many different shapes and sizes and carry heterogeneous annotations, depending on their purpose and design. The true benefit of corpora is rooted in their annotation and the method by which this data is encoded is an important factor in their interoperability. We have accumulated a large collection of multilingual and parallel corpora and encoded it in a unified format which is compatible with a broad range of NLP tools and corpus linguistic applications. In this paper, we present our corpus collection and describe a data model and the extensions to the popular CoNLL-U format that enable us to encode it.
Common Crawl is a considerably large, heterogeneous multilingual corpus comprised of crawled documents from the internet, surpassing 20TB of data and distributed as a set of more than 50 thousand plain text files where each contains many documents written in a wide variety of languages. Even though each document has a metadata block associated to it, this data lacks any information about the language in which each document is written, making it extremely difficult to use Common Crawl for monolingual applications. We propose a general, highly parallel, multithreaded pipeline to clean and classify Common Crawl by language; we specifically design it so that it runs efficiently on medium to low resource infrastructures where I/O speeds are the main constraint. We develop the pipeline so that it can be easily reapplied to any kind of heterogeneous corpus and so that it can be parameterised to a wide range of infrastructures. We also distribute a 6.3TB version of Common Crawl, filtered, classified by language, shuffled at line level in order to avoid copyright issues, and ready to be used for NLP applications.
Nearly all of the very large corpora of English are “static”, which allows a wide range of one-time, pre-processed data, such as collocates. The challenge comes with large “dynamic” corpora, which are updated regularly, and where preprocessing is much more difficult. This paper provides an overview of the NOW corpus (News on the Web), which is currently 8.2 billion words in size, and which grows by about 170 million words each month. We discuss the architecture of NOW, and provide many examples that show how data from NOW can (uniquely) be extracted to look at a wide range of ongoing changes in English.
As the Web ought to be considered as a series of sources rather than as a source in itself, a problem facing corpus construction resides in meta-information and categorization. In addition, we need focused data to shed light on particular subfields of the digital public sphere. Blogs are relevant to that end, especially if the resulting web texts can be extracted along with metadata and made available in coherent and clearly describable collections.