Refine
Document Type
- Article (2)
Language
- English (2)
Has Fulltext
- yes (2)
Keywords
- language (2) (remove)
Publicationstate
- Veröffentlichungsversion (2) (remove)
Reviewstate
- Peer-Review (2)
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Verbs may be attributed to higher agency than other grammatical categories. In Study 1, we confirmed this hypothesis with archival datasets comprising verbs (N = 950) and adjectives (N = 2115). We then investigated whether verbs (vs. adjectives) increase message effectiveness. In three experiments presenting potential NGOs (Studies 2 and 3) or corporate campaigns (Study 4) in verb or adjective form, we demonstrate the hypothesized relationship. Across studies, (overall N = 721) grammatical agency consistently increased message effectiveness. Semantic agency varied across contexts by either increasing (Study 2), not affecting (Study 3), or decreasing (Study 4) the effectiveness of the message. Overall, experiments provide insights in to the meta-semantic effects of verbs – demonstrating how grammar may influence communication outcomes.