Refine
Year of publication
Document Type
- Part of a Book (4500)
- Article (2965)
- Book (996)
- Conference Proceeding (688)
- Part of Periodical (308)
- Review (257)
- Other (151)
- Working Paper (83)
- Doctoral Thesis (68)
- Report (35)
Language
- German (8077)
- English (1765)
- Russian (145)
- French (38)
- Multiple languages (22)
- Spanish (16)
- Portuguese (14)
- Italian (9)
- Polish (7)
- Ukrainian (5)
Keywords
- Deutsch (5140)
- Korpus <Linguistik> (940)
- Wörterbuch (605)
- Konversationsanalyse (451)
- Rezension (423)
- Grammatik (405)
- Rechtschreibung (374)
- Gesprochene Sprache (361)
- Sprachgebrauch (356)
- Interaktion (338)
Publicationstate
- Veröffentlichungsversion (3889)
- Zweitveröffentlichung (1641)
- Postprint (395)
- Preprint (10)
- Erstveröffentlichung (8)
- Ahead of Print (7)
- (Verlags)-Lektorat (4)
- Hybrides Open Access (2)
- Verlags-Lektorat (1)
- Verlagsveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (3835)
- Peer-Review (1595)
- Verlags-Lektorat (94)
- Peer-review (56)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (44)
- Review-Status-unbekannt (14)
- Peer-Revied (12)
- Abschlussarbeit (Bachelor, Master, Diplom, Magister) (Bachelor, Master, Diss.) (10)
- (Verlags-)Lektorat (9)
- (Verlags-)lektorat (5)
Publisher
- de Gruyter (1334)
- Institut für Deutsche Sprache (1091)
- Schwann (638)
- Narr (484)
- Leibniz-Institut für Deutsche Sprache (IDS) (263)
- De Gruyter (244)
- Niemeyer (200)
- Lang (184)
- Narr Francke Attempto (170)
- IDS-Verlag (144)
Entnazifizierung wird fur die Kommunikationsbereiche Zeitkritik / Parteien / Kirche, Administration / Justiz sowie im Zusammenhang mit Spruchkammerverfahren beschrieben und als Teil einer Text-, Begriffs- und Mentalitatsgeschichte verstanden. Die Geschichte der Entnazifizierung ist im wesentlichen Schuldgeschichte, so da!3 sich deren sprachlicher Ausdruck als Teil der deutschen Sprachgeschichte als Begriffsgeschichte darstellt. Intellektualisierung, Instrumentalisierung, sprachlicher Eskapismus und Auflosung sind Merkmale des Schulddiskurses, zu denken auf einer Zeitachse von 1945 bis etwa 1955. Die Untersuchung zeigt, dass der Schuldbegriff in drei Einzelbedeutungen - festgelegt von Philosophic / Kirche / Parteien, von der Administration und Justiz und von den Tatern - zerlegt bleibt. Der entleerte Schuldbegriff der Tater dominiert die offentliche Wahmehmung, und insofern das Befreiungsgesetz hierfiir die Voraussetzungen schafft, ist dieses als Teil der deutschen Sprachgeschichte zu beschreiben.
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ― inferior, but rather ― different.
The authors are pleased to present to the readers of the Zeitschrift für Sprachwissenschaft a Special Issue in honor of Rosemarie Tracy.
Contents:
0. Frontmatter
1. Petra Schulz, Ira Gawlitzek, Angelika Wöllstein: Introduction, S. 1
2. Natascha Müller: Different sources of delay and acceleration in early child bilingualism, S. 7
3. Hubert Haider, Christina Schörghofer-Essl, Karin Seethaler: Quantifying kids prefer intersecting sets - a pilot study, S. 31
4. Petra Schulz, Rabea Schwarze: How strong is the ban on non-finite verbs in V2? Evidence from early second language learners of German with and without SLI, S. 51
5. Monika Rothweiler, Manuela Schönenberger, Franziska Sterner: Subject-verb agreement in German in bilingual children with and without SLI, S. 79
6. Holger Hopp: The processing of English which-questions in adult L2 learners: Effects of L1 transfer and proficiency, S. 107
7. Oksana Laleko, Maria Polinsky: Silence is difficult: On missing elements in bilingual grammars, S. 135
8. Artemis Alexiadou: Building verbs in language mixing varieties, S. 165
Contents:
1. Andreas Dittrich: Intra-connecting a small exemplary literary corpus with semantic web technologies for exploratory literary studies, S. 1
2. John Kirk, Anna Čermáková: From ICE to ICC: The new International Comparable Corpus, S. 7
3. Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasic, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, Kevin Scannell: Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh), S. 13
4. Marc Kupietz, Andreas Witt, Piotr Bański, Dan Tufiş, Dan Cristea, Tamás Váradi: EuReCo - Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research, S. 15
5. Harald Lüngen, Marc Kupietz: CMC Corpora in DeReKo, S. 20
6. David McClure, Mark Algee-Hewitt, Douris Steele, Erik Fredner, Hannah Walser: Organizing corpora at the Stanford Literary Lab, S. 25
7. Radoslav Rábara, Pavel Rychlý ,Ondřej Herman: Accelerating corpus search using multiple cores, S. 30
8. John Vidler, Stephen Wattam: Keeping Properties with the Data: CL-MetaHeaders – An Open Specification, S. 35
9. Vladimir Benko: Are Web Corpora Inferior? The Case of Czech and Slovak, S. 43
10. Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen: Web Corpora – the best possible solution for tracking phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian, S. 49
11. Vít Suchomel: Removing Spam from Web Corpora Through Supervised Learning Using FastText, S. 56
Unlike traditional text corpora collected from trustworthy sources, the content of web based corpora has to be filtered. This study briefly discusses the impact of web spam on corpus usability and emphasizes the importance of removing computer generated text from web corpora.
The paper also presents a keyword comparison of an unfiltered corpus with the same collection of texts cleaned by a supervised classifier trained using FastText. The classifier was able to recognize 71% of web spam documents similar to the training set but lacked both precision and recall when applied to short texts from another data set.
Complex linguistic phenomena, such as Clitic Climbing in Bosnian, Croatian and Serbian, are often described intuitively, only from the perspective of the main tendency. In this paper, we argue that web corpora currently offer the best source of empirical material for studying Clitic Climbing in BCS. They thus allow the most accurate description of this phenomenon, as less frequent constructions can be tracked only in big, well-annotated data sources. We compare the properties of web corpora for BCS with traditional sources and give examples of studies on CC based on web corpora. Furthermore, we discuss problems related to web corpora and suggest some improvements for the future.
CMC Corpora in DeReKo
(2017)
We introduce three types of corpora of computer-mediated communication that have recently been compiled at the Institute for the German Language or curated from an external project and included in DeReKo, the German Reference Corpus, namely Wikipedia (discussion) corpora, the Usenet news corpus, and the Dortmund Chat Corpus. The data and corpora have been converted to I5, the TEI customization to represent texts in DeReKo, and are researchable via the web-based IDS corpus research interfaces and in the case of Wikipedia and chat also downloadable from the IDS repository and download server, respectively.
Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh)
(2017)
CorCenCC is an interdisciplinary and multiinstitutional project that is creating a large-scale, open-source corpus of contemporary Welsh. CorCenCC will be the first ever large-scale corpus to represent spoken, written and electronicallymediated Welsh (compiling an initial data set of 10 million Welsh words), with a functional design informed, from the outset, by representatives of all anticipated academic and community user groups.