Refine
Year of publication
- 2016 (9) (remove)
Document Type
- Part of a Book (4)
- Conference Proceeding (4)
- Doctoral Thesis (1)
Has Fulltext
- yes (9)
Is part of the Bibliography
- yes (9) (remove)
Keywords
- Korpus <Linguistik> (9) (remove)
Publicationstate
Reviewstate
Publisher
Der Beitrag diskutiert das Konzept sprachlicher Praktiken am Beispiel des Planens in kollaborativem beruflichem Schreiben. Gestützt auf eine Fallstudie aus großen Korpora natürlicher empirischer Daten, werden Praktiken herausgearbeitet, die flexibles Planen im dynamischen System der Textproduktion ermöglichen. Deutlich wird, dass die Praktiken wie auch die durch sie geprägten Schreibphasen skalieren, also ähnliche Muster bilden im Kleineren wie im Größeren. Ein solches Verständnis von Planen geht weit über den Planungsbegriff in bisherigen Modellen von Schreibprozessen hinaus. So erweist sich empirische Forschung am Arbeitsplatz als gewinnbringend auch für die theoretische Schärfung des Praktiken-Konzepts. Schreiben als Prozess der Herstellung schriftsprachlicher Äußerungen wurde früh aus sprachpsychologischem Blickwinkel erforscht und modelliert. Bedeutende Phasen und Praktiken des natürlichen Schreibens, außerhalb psychologischer Laborexperimente, sind durch die Dominanz dieser Forschungstradition lange außer Acht geblieben. Der vorliegende Beitrag entwickelt ein dynamisches und komplexes Konzept von Schreibphasen und den sie bestimmenden Praktiken beruflicher Textproduktion (Teil 1). Linguistisch basierte ethnografische Forschung (2) erschließt Schreiben jenseits des Labors als vielschichtiges Zusammenspiel situierter Praktiken im dynamischen System arbeitsteiliger Textproduktion (3). Ein Beispiel einer Analyse erklärt, wie Praktiken flexiblen Planens im Nachrichtenschreiben skalieren (4). Deutlich wird dabei der Sinn empirischer Analyse von Schreibphasen und -praktiken für Theorie und Praxis (5).
Der Beitrag plädiert dafür, die Interaktionale Linguistik stärker für modellorientierte Forschung und datengeleitete Methoden zu öffnen. Er stellt eine Methode vor, wie auf der Basis von Korpora datengeleitet Praktiken rekonstruiert und modelliert werden können. Ausgehend von einer Diskussion der tiefgreifenden Veränderungen, die die Digitalisierung für die Linguistik mit sich bringt, und einer Auseinandersetzung mit dem Modellbegriff, wird der Begriff der (Kommunikativen) Praktik in Abgrenzung zum Begriff der Kommunikativen Gattung bestimmt. Im Anschluss wird am Beispiel von Trostdialogen in OnlineForen eine korpusgeleitete Methode zur Dialogmodellierung vorgestellt. Schließlich werden die Folgen der menschlichen Interaktion mit maschinellen Dialogsystemen reflektiert.
In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.
German research on collocation(s) focuses on many different aspects. A comprehensive documentation would be impossible in this short report. Accepting that we cannot do justice to all the contributions to this area, we just pick out some influential comerstones. This selection does not claim to be representative or balanced, but it follows the idea to constitute the backbone of the story we want to tell: Our ‘German’ view of the still ongoing evolution of a notion of ‘collocation’ Although our own work concerns the theoretical background of and the empirical rationale for collocations, lexicography occupies a large space. Some of the recent publications ( Wahrig 2008, Häcki Buhofer et al. 2014) represent a turn to the empirical legitimation for the selection of typical expressions. Nevertheless, linking the empirical evidence to the needs of an abstract lexicographic description (or a didactic format) is still an open issue.
In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language.
This contribution presents the background, design and results of a study of users of three oral corpus platforms in Germany. Roughly 5.000 registered users of the Database for Spoken German (DGD), the GeWiss corpus and the corpora of the Hamburg Centre for Language Corpora (HZSK) were asked to participate in a user survey. This quantitative approach was complemented by qualitative interviews with selected users. We briefly introduce the corpus resources involved in the study in section 2. Section 3 describes the methods employed in the user studies. Section 4 summarizes results of the studies focusing on selected key topics. Section 5 attempts a generalization of these results to larger contexts.
The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.
This thesis consists of the following three papers that all have been published in international peer-reviewed journals:
Chapter 3: Koplenig, Alexander (2015c). The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv037]
Chapter 4: Koplenig, Alexander (2015b). Why the quantitative analysis of dia-chronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv030]
Chapter 5: Koplenig, Alexander (2015a). Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis. Published in: Corpus Linguistics and Linguistic Theory. Berlin/Boston: de Gruyter. [doi:10.1515/cllt-2014-0049]
Chapter 1 introduces the topic by describing and discussing several basic concepts relevant to the statistical analysis of corpus linguistic data. Chapter 2 presents a method to analyze diachronic corpus data and a summary of the three publications. Chapters 3 to 5 each represent one of the three publications. All papers are printed in this thesis with the permission of the publishers.