OPUS 4 | Search

10107 search hits

5101 to 5110

Sort by

Rechte Sprache im historischen Vergleich. Sprachliche Gewalt und gezielte Tabubrüche (2017)

In ihrem Vortrag beim Jahrestag der Stiftung Erinnerung Ulm am 14. Februar 2017 verglich Prof. Dr. Heidrun Kämper Sprachmuster der AfD mit jenen der extremen Rechten der Weimarer Republik. Der Beitrag bietet eine Zusammenfassung ihrer Ausführungen.

Entnazifizierung - Sprachliche Existenzformen eines ethischen Konzepts (1998)

Kämper, Heidrun

Entnazifizierung wird fur die Kommunikationsbereiche Zeitkritik / Parteien / Kirche, Administration / Justiz sowie im Zusammenhang mit Spruchkammerverfahren beschrieben und als Teil einer Text-, Begriffs- und Mentalitatsgeschichte verstanden. Die Geschichte der Entnazifizierung ist im wesentlichen Schuldgeschichte, so da!3 sich deren sprachlicher Ausdruck als Teil der deutschen Sprachgeschichte als Begriffsgeschichte darstellt. Intellektualisierung, Instrumentalisierung, sprachlicher Eskapismus und Auflosung sind Merkmale des Schulddiskurses, zu denken auf einer Zeitachse von 1945 bis etwa 1955. Die Untersuchung zeigt, dass der Schuldbegriff in drei Einzelbedeutungen - festgelegt von Philosophic / Kirche / Parteien, von der Administration und Justiz und von den Tatern - zerlegt bleibt. Der entleerte Schuldbegriff der Tater dominiert die offentliche Wahmehmung, und insofern das Befreiungsgesetz hierfiir die Voraussetzungen schafft, ist dieses als Teil der deutschen Sprachgeschichte zu beschreiben.

Are web corpora inferior? The Case of Czech and Slovak (2017)

Benko, Vladimír

Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ― inferior, but rather ― different.

Monolingual and bilingual language acquisition: Harvesting the fruits from the grammar tree (2017)

The authors are pleased to present to the readers of the Zeitschrift für Sprachwissenschaft a Special Issue in honor of Rosemarie Tracy. Contents: 0. Frontmatter 1. Petra Schulz, Ira Gawlitzek, Angelika Wöllstein: Introduction, S. 1 2. Natascha Müller: Different sources of delay and acceleration in early child bilingualism, S. 7 3. Hubert Haider, Christina Schörghofer-Essl, Karin Seethaler: Quantifying kids prefer intersecting sets - a pilot study, S. 31 4. Petra Schulz, Rabea Schwarze: How strong is the ban on non-finite verbs in V2? Evidence from early second language learners of German with and without SLI, S. 51 5. Monika Rothweiler, Manuela Schönenberger, Franziska Sterner: Subject-verb agreement in German in bilingual children with and without SLI, S. 79 6. Holger Hopp: The processing of English which-questions in adult L2 learners: Effects of L1 transfer and proficiency, S. 107 7. Oksana Laleko, Maria Polinsky: Silence is difficult: On missing elements in bilingual grammars, S. 135 8. Artemis Alexiadou: Building verbs in language mixing varieties, S. 165

Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017 (2017)

Contents: 1. Andreas Dittrich: Intra-connecting a small exemplary literary corpus with semantic web technologies for exploratory literary studies, S. 1 2. John Kirk, Anna Čermáková: From ICE to ICC: The new International Comparable Corpus, S. 7 3. Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasic, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, Kevin Scannell: Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh), S. 13 4. Marc Kupietz, Andreas Witt, Piotr Bański, Dan Tufiş, Dan Cristea, Tamás Váradi: EuReCo - Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research, S. 15 5. Harald Lüngen, Marc Kupietz: CMC Corpora in DeReKo, S. 20 6. David McClure, Mark Algee-Hewitt, Douris Steele, Erik Fredner, Hannah Walser: Organizing corpora at the Stanford Literary Lab, S. 25 7. Radoslav Rábara, Pavel Rychlý ,Ondřej Herman: Accelerating corpus search using multiple cores, S. 30 8. John Vidler, Stephen Wattam: Keeping Properties with the Data: CL-MetaHeaders – An Open Specification, S. 35 9. Vladimir Benko: Are Web Corpora Inferior? The Case of Czech and Slovak, S. 43 10. Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen: Web Corpora – the best possible solution for tracking phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian, S. 49 11. Vít Suchomel: Removing Spam from Web Corpora Through Supervised Learning Using FastText, S. 56

Removing spam from web corpora through supervised learning using FastText (2017)

Suchomel, Vít

Unlike traditional text corpora collected from trustworthy sources, the content of web based corpora has to be filtered. This study briefly discusses the impact of web spam on corpus usability and emphasizes the importance of removing computer generated text from web corpora. The paper also presents a keyword comparison of an unfiltered corpus with the same collection of texts cleaned by a supervised classifier trained using FastText. The classifier was able to recognize 71% of web spam documents similar to the training set but lacked both precision and recall when applied to short texts from another data set.

Web corpora - the best possible solution for tracking rare phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian (2017)

Jurkiewicz-Rohrbacher, Edyta ; Kolaković, Zrinka ; Hansen, Björn

Complex linguistic phenomena, such as Clitic Climbing in Bosnian, Croatian and Serbian, are often described intuitively, only from the perspective of the main tendency. In this paper, we argue that web corpora currently offer the best source of empirical material for studying Clitic Climbing in BCS. They thus allow the most accurate description of this phenomenon, as less frequent constructions can be tracked only in big, well-annotated data sources. We compare the properties of web corpora for BCS with traditional sources and give examples of studies on CC based on web corpora. Furthermore, we discuss problems related to web corpora and suggest some improvements for the future.

EuReCo - Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research (2017)

Kupietz, Marc ; Witt, Andreas ; Bański, Piotr ; Tufiş, Dan ; Cristea, Dan ; Váradi, Tamás

In this paper we discuss the opportunities, prerequisites, possible applications and implications of a virtually joint corpus based on existing national, reference or other large corpora and their host institutions.

CMC Corpora in DeReKo (2017)

Lüngen, Harald ; Kupietz, Marc

We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.

Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh) (2017)

Knight, Dawn ; Fitzpatrick, Tess ; Morris, Steve ; Evas, Jeremy ; Rayson, Paul ; Spasić, Irena ; Stonelake, Mark ; Thomas, Enlli Môn ; Neale, Steven ; Needs, Jennifer ; Piao, Scott ; Rees, Mair ; Watkins, Gareth ; Anthony, Laurence ; Cobb, Thomas Michael ; Deuchar, Margaret ; Donnelly, Kevin ; McCarthy, Michael ; Scannell, Kevin

CorCenCC is an interdisciplinary and multiinstitutional project that is creating a large-scale, open-source corpus of contemporary Welsh. CorCenCC will be the first ever large-scale corpus to represent spoken, written and electronicallymediated Welsh (compiling an initial data set of 10 million Welsh words), with a functional design informed, from the outset, by representatives of all anticipated academic and community user groups.

5101 to 5110

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

10107 search hits