Refine
Year of publication
- 2017 (370) (remove)
Document Type
- Part of a Book (161)
- Article (101)
- Conference Proceeding (43)
- Book (33)
- Part of Periodical (13)
- Other (7)
- Working Paper (6)
- Report (4)
- Doctoral Thesis (2)
Keywords
- Deutsch (154)
- Korpus <Linguistik> (64)
- Gesprochene Sprache (30)
- Grammatik (22)
- Sprachvariante (22)
- Englisch (14)
- Linguistik (14)
- Sprache (14)
- Diskursanalyse (13)
- Interaktion (13)
Publicationstate
- Veröffentlichungsversion (163)
- Zweitveröffentlichung (87)
- Postprint (20)
- Erstveröffentlichung (1)
- Preprint (1)
Reviewstate
- (Verlags)-Lektorat (135)
- Peer-Review (114)
- Peer-review (12)
- (Verlags-)Lektorat (2)
- Peer-Revied (2)
- (Verlags-)lektorat (1)
- Peer Review (1)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (1)
Publisher
- Institut für Deutsche Sprache (56)
- de Gruyter (50)
- Narr Francke Attempto (39)
- Narr (19)
- De Gruyter (17)
- Verlag für Gesprächsforschung (11)
- Stauffenburg (10)
- Hempen (9)
- Springer (6)
- TUDpress (6)
This paper presents a short insight into a new project at the "Institute for the German Language” (IDS) (Mannheim). It gives an insight into some basic ideas for a corpus-based dictionary of spoken German, which will be developed and compiled by the new project "The Lexicon of spoken German” (Lexik des gesprochenen Deutsch, LeGeDe). The work is based on the "Research and Teaching Corpus of Spoken German” (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK), which is implemented in the "Database for Spoken German” (Datenbank für Gesprochenes Deutsch, DGD). Both resources, the database and the corpus, have been developed at the IDS.
Der Band umfasst grundlegende Arbeiten, die methodologisch und im Sinne des Theorieaufbaus besonders aussagekräftig und allgemein anwendbar sind, sowie Werke, die den Fokus auf deutschsprachige Länder und/oder Deutsch als Einzelsprache legen. In der Einleitung des Bandes werden – zum ersten Mal im deutschsprachigen Kontext – Sprachplanung/Sprach(en)politik und Sprachmanagementtheorie in Beziehung gesetzt. Exkursorisch werden außerdem Sprachenrecht, europäische Sprachenpolitik und Deutsch im Licht von Sprachplanung/Sprach(en)politik und Sprachmanagement thematisiert.
Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that–despite differences in the way information is expressed–there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.
In recent years, formal semantic research on the meaning of tense and aspect has benefited from a number of studies investigating languages with graded tense systems. This paper contributes a first sketch of the temporal marking system of Awing (Grassfields Bantu), focusing on two varieties of remote past and remote future. We argue that the data support a "symmetric" analysis of past and future tense in Awing. In our specific proposal, Awing temporal remoteness markers are uniformly analyzed as quantificational tense operators, and both the past and the future paradigm include a form that prevents contextual restriction of this temporal quantifier.
The Google Ngram Corpora seem to offer a unique opportunity to study linguistic and cultural change in quantitative terms. To avoid breaking any copyright laws, the data sets are not accompanied by any metadata regarding the texts the corpora consist of. Some of the consequences of this strategy are analyzed in this article. I chose the example of measuring censorship in Nazi Germany, which received widespread attention and was published in a paper that accompanied the release of the Google Ngram data (Michel et al. (2010): Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176–82). I show that without proper metadata, it is unclear whether the results actually reflect any kind of censorship at all. Collectively, the findings imply that observed changes in this period of time can only be linked directly to World War II to a certain extent. Therefore, instead of speaking about general linguistic or cultural change, it seems to be preferable to explicitly restrict the results to linguistic or cultural change ‘as it is represented in the Google Ngram data’. On a more general level, the analysis demonstrates the importance of metadata, the availability of which is not just a nice add-on, but a powerful source of information for the digital humanities.
Recently, a claim was made, on the basis of the German Google Books 1-gram corpus (Michel et al., Quantitative Analysis of Culture Using Millions of Digitized Books. Science 2010; 331: 176–82), that there was a linear relationship between six non-technical non-Nazi words and three ‘explicitly Nazi words’ in times of World War II (Caruana-Galizia. 2015. Politics and the German language: Testing Orwell’s hypothesis using the Google N-Gram corpus. Digital Scholarship in the Humanities [Online]. http://dsh.oxfordjournals.org/cgi/doi/10.1093/llc/fqv011 (accessed 15 April 2015)). Here, I try to show that apparent relationships like this are the result of misspecified models that do not take into account the temporal aspect of time-series data. The main point of this article is to demonstrate why such analyses run the risk of incorrect statistical inference, where potential effects are both meaningless and can potentially lead to wrong conclusions.