Refine
Year of publication
- 2016 (61) (remove)
Document Type
- Part of a Book (36)
- Article (10)
- Book (7)
- Conference Proceeding (7)
- Doctoral Thesis (1)
Is part of the Bibliography
- yes (61) (remove)
Keywords
- Deutsch (24)
- Korpus <Linguistik> (9)
- Diskursanalyse (8)
- Computerunterstützte Lexikographie (4)
- Gesprochene Sprache (4)
- Handlungstheorie (4)
- Interaktion (4)
- Französisch (3)
- Kommunikation (3)
- Kommunikationsforschung (3)
Publicationstate
Reviewstate
Publisher
- de Gruyter (17)
- De Gruyter (11)
- European Language Resources Association (ELRA) (6)
- Institut für Deutsche Sprache (6)
- Winter (4)
- Frank & Timme (3)
- Narr (2)
- Oxford University Press (2)
- Association for Computational Linguistics (1)
- Buske (1)
This thesis consists of the following three papers that all have been published in international peer-reviewed journals:
Chapter 3: Koplenig, Alexander (2015c). The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv037]
Chapter 4: Koplenig, Alexander (2015b). Why the quantitative analysis of dia-chronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv030]
Chapter 5: Koplenig, Alexander (2015a). Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis. Published in: Corpus Linguistics and Linguistic Theory. Berlin/Boston: de Gruyter. [doi:10.1515/cllt-2014-0049]
Chapter 1 introduces the topic by describing and discussing several basic concepts relevant to the statistical analysis of corpus linguistic data. Chapter 2 presents a method to analyze diachronic corpus data and a summary of the three publications. Chapters 3 to 5 each represent one of the three publications. All papers are printed in this thesis with the permission of the publishers.
Wiegand’s opus magnum „Wörterbuchforschung“ ends with a chapter on the state and the relevant taslcs for research into dictionary use in the middle of the 1990s. This article aims at reflecting the taste and the relevance of dictionary usage research 20 years later. I will argue that the fundamentally changed lexicographic landscape makes it necessary to shift the focus of research. In my view, the most important aim of research into dictionary use can no longer be limited to improving dictionaries. Research into dictionary use should also raise more awareness for user- orientation in general and should provide methodological reflection to enlighten the increasingly important usage statistics for online dictionaries. Another goal should be to look behind the scenes of collaborative dictionaries in order to provide background data to classify their relevance in relation to dictionaries elaborated by lexicographic experts. The crisis of lexicography makes it also necessary to broaden our view and concentrate on situations in which linguistic questions arise. In this context, we could examine in which of these situations the consultation of lexicographic data helps. In summary, the aim of research into dictionary use is to identify the fields where sound lexicographic work is really helpful for potential users.
In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.
This paper presents C-WEP, the Collection of Writing Errors by Professionals Writers of German. It currently consists of 245 sentences with grammatical errors. All sentences are taken from published texts. All authors are professional writers with high skill levels with respect to German, the genres, and the topics. The purpose of this collection is to provide seeds for more sophisticated writing support tools as only a very small proportion of those errors can be detected by state-of-the-art checkers. C-WEP is annotated on various levels and freely available.
German research on collocation(s) focuses on many different aspects. A comprehensive documentation would be impossible in this short report. Accepting that we cannot do justice to all the contributions to this area, we just pick out some influential comerstones. This selection does not claim to be representative or balanced, but it follows the idea to constitute the backbone of the story we want to tell: Our ‘German’ view of the still ongoing evolution of a notion of ‘collocation’ Although our own work concerns the theoretical background of and the empirical rationale for collocations, lexicography occupies a large space. Some of the recent publications ( Wahrig 2008, Häcki Buhofer et al. 2014) represent a turn to the empirical legitimation for the selection of typical expressions. Nevertheless, linking the empirical evidence to the needs of an abstract lexicographic description (or a didactic format) is still an open issue.
The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.
Ausgehend von fundamentalen Einsichten konversationsanalytischer
Interaktionsforschung zum zentralen Stellenwert, den leibliche Kopräsenz und wechselseitige Wahrnehmung für die Ausgestaltung unserer interaktiven Praktiken besitzen, untersucht der Beitrag deiktische Praktiken in der Kommunikation von Angesicht zu Angesicht. Deixis – verbales und gestisches Zeigen für einen Anderen – kann phylo- und ontogenetisch (Tomasello 2003, 2006, 2008) als privilegierte Schnittstelle zwischen Interaktion und Grammatik, zwischen Sprache, menschlichen Körpern, Objekten, Wahrnehmung und Raum betrachtet werden. Auf der Grundlage eines breit angelegten Videokorpus unterschiedlicher Genres werden deiktische Zeigehandlungen als situierte, körpergebundene Praktiken analysiert und systematisch auf transsituative Gemeinsamkeiten und Unterschiede befragt. Die Ergebnisse der empirischen Analysen zur demonstratio ad oculos (dem Zeigen auf Sichtbares, Bühler 1965) und zur Deixis am Phantasma (dem Zeigen auf Unsichtbares, ebd.) werden in einen übergreifenden theoretischen Modell integriert. In dem multimodalen Modell wird Deixis als situierte, die interaktiven, kognitiven und perzeptorischen Ressourcen aller Beteiligten mobilisierende Praxis gemeinsamer Aufmerksamkeitsfokussierung begriffen (Stukenbrock 2015b).