OPUS 4 | Search

26 search hits

1 to 10

Sort by

Analyzing lexical change in diachronic corpora (2016)

This thesis consists of the following three papers that all have been published in international peer-reviewed journals: Chapter 3: Koplenig, Alexander (2015c). The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv037] Chapter 4: Koplenig, Alexander (2015b). Why the quantitative analysis of dia-chronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv030] Chapter 5: Koplenig, Alexander (2015a). Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis. Published in: Corpus Linguistics and Linguistic Theory. Berlin/Boston: de Gruyter. [doi:10.1515/cllt-2014-0049] Chapter 1 introduces the topic by describing and discussing several basic concepts relevant to the statistical analysis of corpus linguistic data. Chapter 2 presents a method to analyze diachronic corpus data and a summary of the three publications. Chapters 3 to 5 each represent one of the three publications. All papers are printed in this thesis with the permission of the publishers.

Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison (2016)

Schäfer, Roland ; Bildhauer, Felix

In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.

C-WEP―rich annotated collection of Writing errors by professionals (2016)

Mahlow, Cerstin

This paper presents C-WEP, the Collection of Writing Errors by Professionals Writers of German. It currently consists of 245 sentences with grammatical errors. All sentences are taken from published texts. All authors are professional writers with high skill levels with respect to German, the genres, and the topics. The purpose of this collection is to provide seeds for more sophisticated writing support tools as only a very small proportion of those errors can be detected by state-of-the-art checkers. C-WEP is annotated on various levels and freely available.

Corpus Query Lingua Franca (CQLF) (2016)

Bański, Piotr ; Frick, Elena ; Witt, Andreas

The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.

Der Begriff des Postponierers im Licht von Sprachvergleichsdaten Deutsch-Italienisch (2016)

Ravetto, Miriam ; Blühdorn, Hardarik

Der vorliegende Aufsatz untersucht die Syntax und Semantik sogenannter Postponierer, d.h. konjunktionaler Konnektoren, die den von ihnen eingeleiteten Nebensatz dem Hauptsatz stets nachstellen. Anhand von sodass und zumal werden die Kerneigenschaften solcher Konnektoren im Deutschen vorgestellt. Am Beispiel der italienischen Konjunktionen cosicché, tanto più che und perché wird diskutiert, ob der Begriff des Postponierers für den Sprachvergleich genutzt werden kann. In einem nächsten Schritt werden die Postponierer des Deutschen unter Beiziehung sprachgeschichtlicher Argumente präziser beschrieben und im Übergangsfeld zwischen Adverbkonnektoren und Subjunktoren verortet. Es zeigt sich, dass die untersuchten Konnektoren sich letztlich sehr unterschiedlich verhalten, sodass es fraglich erscheint, ob ihre Zusammenfassung zu einer gemeinsamen Klasse gerechtfertigt ist.

Der lexikographische Prozess im Projekt elexiko (2016)

Klosa, Annette

Deutsch-russisches Neologismenwörterbuch. Neuer Wortschatz im Deutschen, 1991-2010. Bd. 1 - 2 (A-Z) (2016)

Steffens, Doris ; Nikitina, Olga

Dieses Wörterbuch, das auf dem ersten größeren Neologismenwörterbuch für das Deutsche fußt, schließt eine Lücke in der deutsch-russischen Wörterbuchlandschaft: Es präsentiert dem Benutzer den neuen deutschen Wortschatz, den er in anderen Wörterbüchern meist vergeblich sucht. Enthalten sind fast 2000 neue Wörter (z.B. Kletterwald, scrollen), neue feste Wortverbindungen (z.B. etw. in die Tonne treten, der Drops ist gelutscht) und neue Bedeutungen etablierter Wörter (z.B. halbrund, Stolperstein), von denen rund 1350 umfassend lexikografisch beschrieben sind. Die vielen Verknüpfungen zwischen den Stichwörtern ermöglichen Einblicke in die Vernetztheit des neuen Wortschatzes und leisten so einen wichtigen Beitrag für den Wortschatzerwerb.

Die Flüchtlingsdebatte in den Medien Deutschlands (2016)

Becker, Maria

Eine Zeitlang - über die ärgerliche Univerbierung (2016)

Donalies, Elke

Einleitung: Kontrastivität/Satzanfang/Korpus (2016)

Dalmas, Martine ; Fabricius-Hansen, Cathrine ; Schwinn, Horst