Refine
Document Type
- Preprint (11) (remove)
Is part of the Bibliography
- yes (11) (remove)
Keywords
- Korpus <Linguistik> (5)
- Deutsch (3)
- Geschlechtergerechte Sprache (2)
- Metadaten (2)
- Sprachwandel (2)
- Statistik (2)
- Archiv für Gesprochenes Deutsch (AGD) (1)
- Autocorrelated errors (1)
- Computerlinguistik (1)
- Datenanalyse (1)
Publicationstate
- Veröffentlichungsversion (7)
- Postprint (1)
- Preprint (1)
Reviewstate
Publisher
Frimer et al. (2015) claim that there is a linear relationship between the level of prosocial language and the level of public disapproval of US Congress. A re-analysis demonstrates that this relationship is the result of a misspecified model that does not account for first-order autocorrelated disturbances. A Stata script to reproduce all presented results is available as an appendix.
It was recently suggested in a study published in Nature Human Behaviour that the historical loosening of American culture was associated with a trade-off between higher creativity and lower order. To this end, Jackson et al. generate a linguistic index of cultural tightness based on the Google Books Ngram corpus and use this index to show that American norms loosened between 1800 and 2000. While we remain agnostic toward a potential loosening of American culture and a statistical association with creativity/order, we show here that the methods used by Jackson et al. are neither suitable for testing the validity of the index nor for establishing possible relationships with creativity/order.
In a previous study, Aceves and Evans present a large-scale quantitative information-theoretic analysis of parallel corpus data in ~1,000 languages to show that there are apparently strong associations between the way languages encode information into words and patterns of communication, e.g. the configuration of semantic information. During the peer review process, one reviewer raised the question of the extent to which the presented results depend on different corpus sizes (see the Peer Review File). This is a very important question given that most, if not all, of the quantities associated with word frequency distributions vary systematically with corpus size. While Aceves and Evans claim that corpus size does not affect the results presented, I challenge this view by presenting reanalyses of the data that clearly suggest that it does.
Speech islands are historically and developmentally unique and will inevitably disappear within the next decades. We urgently need to preserve their remains and exploit what is left in order to make research on language-in-contact and historical as well as current comparative language research possible.
The Archive for Spoken German (AGD) at the Institute for German Language collects, fosters and archives data from completed research projects and makes them available to the wider research community.
Besides large variation corpora and corpora of conversational speech, the archive already contains a range of collections of data on German speech minorities. The latter will be outlined in this chapter. Some speech island data is already made available through the personal service of the AGD, or the database of spoken German (DGD), e.g. data on Australian German, Unserdeutsch, or German in North America. Some corpora are still being prepared for publication, but still important to document for potentially interested research projects. We therefore also explain the current problems and efforts related to the curation of speech island data, from the digitization of recordings and the collection of metadata, to the integration of transcriptions, annotations and other ways of accessing and sharing data.
Less than one percent of words would be affected by gender-inclusive language in German press texts
(2024)
Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.
In a previous study published in Nature Human Behaviour, Varnum and Grossmann claim that reductions in gender inequality are linked to reductions in pathogen prevalence in the United States between 1951 and 2013. Since the statistical methods used by Varnum and Grossmann are known to induce (seemingly) significant correlations between unrelated time series, so-called spurious or non-sense correlations, we test here whether the statistical association between gender inequality and pathogens prevalence in its current form also is the result of mis-specified models that do not correctly account for the temporal structure of the data. Our analysis clearly suggests that this is the case. We then discuss and apply several standard approaches of modelling time-series processes in the data and show that there is, at least as of now, no support for a statistical association between gender inequality and pathogen prevalence.
This paper focuses on language change based on shifting social norms, in particular with regard to the debate on language and gender. It is a recurring argument in this debate that language develops "naturally" and that "severe interventions" - such as gender-inclusive language is often claimed to be - in the allegedly "organic" language system are inappropriate and even "dangerous". Such interventions are, however, not unprecedented. Socially motivated processes of language change are neither unusual nor new. We focus in our contribution on one important political-social space in Germany, the German Bundestag. Taking other struggles about language and gender in the plenaries of the Bundestag as a starting point, our article illustrates that language and gender has been a recurring issue in the German Bundestag since the 1980s. We demonstrate how this is reflected in linguistic practices of the Bundestag, by the use of a) designations for gays and lesbians; b) pair forms such as Bürgerinnen und Bürger (female and male citizens); and c) female forms of addresses and personal nouns ('Präsidentin' in addition to 'Präsident'). Lastly, we will discuss implications of these earlier language battles for the currently very heated debate about gender-inclusive language, especially regarding new forms with gender symbols like the asterisk or the colon (Lehrer*innen, Lehrer:innen; male*female teachers) which are intended to encompass all gender identities.
In a recent paper published in the Journal of Language Evolution, Kauhanen, Einhaus & Walkden (KEW) challenge the results presented in one of my papers (Koplenig, Royal Society Open Science, 6, 181274 (2019)), in which I tried to show through a series of statistical analyses that large numbers of L2 (second language) speakers do not seem to affect the (grammatical or statistical) complexity of a language. To this end, I focus on the way in which the Ethnologue assesses language status: a language is characterised as vehicular if, in addition to being used by L1 (first language) speakers, it should also have a significant number of L2 users. KEW criticise both the use of vehicularity as a (binary) indicator of whether a language has a significant number of L2 users and the idea of imputing a zero proportion of L2 speakers to non-vehicular languages whenever a direct estimate of that proportion is unavailable. While I recognise the importance of post-publication commentary on published research, I show in this rejoinder that both points of criticism are explicitly mentioned and analysed in my paper. In addition, I also comment on other points raised by KEW and demonstrate that both alternative analyses offered by KEW do not stand up to closer scrutiny.
As a result of legal restrictions the Google Ngram Corpora datasets are a) not accompanied by any metadata regarding the texts the corpora consist of and the data are b) truncated to prevent an indirect conclusion from the n-gram to the author of the text. Some of the consequences of this strategy are discussed in this article.
Developments within the field of Second Language Acquisition (SLA) have meant that scholars are increasingly engaging with corpora and corpus-based resources, providing a source of “‘authentic’ language” to learners and educators (Mitchell 2020: 254), and contributing to “state-of-the-art research methodologies” (Deshors and Gries 2023: 164). However, there are areas in which progress can still be made, particularly in the area of metadata, such as information about the speaker and contexts of the language use, as well as increased variety in the text types and genres of corpora used to develop SLA materials (Paquot 2022: 36). This post discusses one such possibility for increasing the variety of text types and providing a rich source of authentic language that can be used to create engaging SLA materials, particularly for young people learning German, namely the use of the NottDeuYTSch corpus (to download the corpus in a variety of formats, see Cotgrove 2018).