Refine
Year of publication
Document Type
- Article (39)
- Part of a Book (31)
- Other (6)
- Conference Proceeding (3)
- Preprint (3)
- Book (2)
Keywords
- Deutsch (31)
- Korpus <Linguistik> (27)
- Wortschatz (18)
- Sprachstatistik (14)
- Wörterbuch (14)
- COVID-19 (11)
- Datenanalyse (9)
- Lexikostatistik (9)
- Online-Medien (9)
- Vielfalt (8)
Publicationstate
- Veröffentlichungsversion (46)
- Zweitveröffentlichung (28)
- Postprint (13)
Reviewstate
- Peer-Review (38)
- (Verlags)-Lektorat (28)
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (8)
- de Gruyter (7)
- IDS-Verlag (4)
- Oxford University Press (4)
- Wilhelm Fink (4)
- Cornell University (3)
- De Gruyter (3)
- Erich Schmidt (3)
- Frank & Timme (3)
- Institut für Deutsche Sprache (3)
This article examines the contrasts and commonalities between languages for specific purposes (LSP) and their popularizations on the one hand and the frequency patterns of LSP register features in English and German on the other. For this purpose corpora of expertexpert and expert-lay communication are annotated for part-of-speech and phrase structure information. On this basis, the frequencies of pre- and post-modifications in complex noun phrases are statistically investigated and compared for English and German. Moreover, using parallel and comparable corpora it is tested whether English-German translations obey the register norms of the target language or whether the LSP frequency patterns of the source language Ñshine throughì. The results provide an empirical insight into language contact phenomena involving specialized communication.
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
Reading corpora are text collections that are enriched with processing data. From a corpus linguist’s perspective, they can be seen as an extension of classical linguistic corpora with human language processing behavior. From a psycholinguist’s perspective, reading corpora allow to test psycholinguistic hypotheses on subsets of language and language processing as it is ‘in the wild’ – in contrast to strictly controlled language material in isolated sentences, as used in most psycholinguistic experiments. In this paper, we will investigate a relevance-based account of language processing which states that linguistic structures, that are embedded deeper syntactically, are read faster because readers allocate less attention to these structures.
In this contribution, we present a novel approach for the analysis of cross-reference structures in digital dictionaries on the basis of the complete dictionary database. Using paradigmatic items in the German Wiktionary as an example, we show how analyses based on graph theory can be fruitfully applied in this context, e. g. to gain an overview of paradigmatic references as a whole or to detect closely connected groups of headwords. Furthermore, we connect information about cross-reference structures with corpus frequencies and log file statistics. In this way, we can answer questions such as the following ones: Are frequent words paradigmatically linked more closely than others? Are closely linked headwords or headwords that stand more solitary in the dictionary visited significantly more often?
Die öffentliche Akzeptanz und Wirkung natur- und technikwissenschaftlicher Forschung hängt grundlegend davon ab, ob sich die Ziele und Forschungsergebnisse an die Öffentlichkeit vermitteln lassen. Doch die Inhalte aktueller Forschungsvorhaben sind für ein Laienpublikum oft nur schwer zugänglich und verständlich. Vor dem Hintergrund, die gesellschaftliche Diskussion natur- und technikwissenschaftlicher Forschung zu verbessern, untersuchen und bewerten wir im Projekt PopSci – Understanding Science einen wichtigen Sektor des populärwissenschaftlichen Diskurses in Deutschland empirisch. Hierfür identifizieren wir die linguistischen Merkmale deutscher populärwissenschaftlicher Texte durch korpusbasierte Methoden und untersuchen deren Effekt auf die kognitive Verarbeitung der Texte durch Laien. Dazu setzen wir Vor- und Nachwissenstests ein. Außerdem messen wir die Blickbewegungen der Leserinnen und Leser, während sie populärwissenschaftliche Texte lesen. Aus dieser Kombination von unterschiedlichen Methoden versuchen wir, erste Empfehlungen zur Verbesserung des linguistischen Stils und der Wissensrepräsentation populärwissenschaftlicher Texte abzuleiten.
We present studies using the 2013 log files from the German version of Wiktionary. We investigate several lexicographically relevant variables and their effect on look-up frequency: Corpus frequency of the headword seems to have a strong effect on the number of visits to a Wiktionary entry. We then consider the question of whether polysemic words are looked up more often than monosemic ones. Here, we also have to take into account that polysemic words are more frequent in most languages. Finally, we present a technique to investigate the time-course of look-up behaviour for specific entries. We exemplify the method by investigating influences of (temporary) social relevance of specific headwords.
We present an empirical study addressing the question whether, and to which extent, lexicographic writing aids improve text revision results. German university students were asked to optimise two German texts using (1) no aids at all, (2) highlighted problems, or (3) highlighted problems accompanied by lexicographic resources that could be used to solve the specific problems. We found that participants from the third group corrected the largest number of problems and introduced the fewest semantic distortions during revision. Also, they reached the highest overall score and were most efficient (as measured in points per time). The second group with highlighted problems lies between the two other groups in almost every measure we analysed. We discuss these findings in the scope of intelligent writing environments, the effectiveness of writing aids in practical usage situations and teaching dictionary skills.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
The author presents a study using eye-tracking-while-reading data from participants reading German jurisdictional texts. I am particularly interested in nominalisations. It can be shown that nominalisations are read significantly longer than other nouns and that this effect is quite strong. Furthermore, the results suggest that nouns are read faster in reformulated texts. In the reformulations, nominalisations were transformed into verbal structures. Reformulations did not lead to increased processing times of verbal constructions but reformulated texts were read faster overall. Where appropriate, results are compared to a previous study of Hansen et al. (2006) using the same texts but other methodology and statistical analysis.
Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that–despite differences in the way information is expressed–there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.
Standardisierte statistische Auswertungen von Korpusdaten im Projekt "Korpusgrammatik" (KoGra-R)
(2017)
Wir zeigen anhand dreier Beispielanalysen, wie das im IDS-Projekt „Korpusgrammatik“ entwickelte Auswertungstool KoGra-R in der quantitativlinguistischen Forschung zur Analyse von Frequenzdaten auf mehreren linguistischen Ebenen eingesetzt werden kann. Wir demonstrieren dies anhand regionaler Präferenzen bei der Selektion von Genitivallomorphen, der Variation von Relativpronomina sowie der Verwendung bestimmter anaphorischer Ausdrucke in Abhängigkeit davon, ob sich das Antezedens im gleichen Satz befindet oder nicht. Die in KoGra-R implementierten statistischen Tests sind für jede dieser Ebenen geeignet, um mindestens einen ersten statistisch abgesicherten Eindruck der Datenlage zu erlangen.
Juristische Texte sind schwer zu verstehen, insbesondere – aber nicht nur – für juristische Laien. Dieser Band beleuchtet diese These ausgehend von linguistischen Verständlichkeitsmodellen und kognitionswissenschaftlichen Modellen der menschlichen Textverarbeitung. Anhand von Aufzeichnungen von Blickbewegungen beim Lesen, einem sogenannten Lesekorpus, werden umfangreiche statistische Modelle berechnet. Diese geben Auskunft über Fragen psycholinguistischer Grundlagenforschung auf der Wort-, Satz- und Textebene. Ferner wird untersucht, wie sich Reformulierungen auf den Verstehensprozess auswirken. Dabei stehen bekannte Komplexitätsmarker deutscher juristischer Texte im Fokus: Nominalisierungen, komplexe Nominalphrasen und syntaktisch komplexe Texte.
In the past two decades, more and more dictionary usage studies have been published, but most of them deal with questions related to what users appreciate about dictionaries, which dictionaries they use and what type of information they need in specific situations — presupposing that users actually consult lexicographic resources. However, language teachers and lecturers in linguistics often have the impression that students do not use enough high-quality dictionaries in their everyday work. With this in mind, we launched an international cooperation project to collect empirical data to evaluate what it is that students actually do while attempting to solve language problems. To this end, we applied a new methodological setting: screen recording in conjunction with a thinking-aloud task. The collected empirical data offers a broad insight into what users really do while they attempt to solve language-related tasks online.
We present ESDexplorer (https://owid.shinyapps.io/ESDexplorer), a browser application which allows the user to explore the data from a large European survey on dictionary use and culture. We built ESDexplorer with several target groups in mind: our cooperation partners, other researchers, and a more general public interested in the results. Also, we present in detail the architecture and technological realisation of the application and discuss some legal aspects of data protection that motivated some architectural choices.