Refine
Year of publication
- 2022 (357) (remove)
Document Type
- Part of a Book (196)
- Article (74)
- Book (26)
- Conference Proceeding (20)
- Other (13)
- Review (13)
- Part of Periodical (10)
- Doctoral Thesis (3)
- Preprint (2)
Language
- German (205)
- English (147)
- French (4)
- Multiple languages (1)
Keywords
- Deutsch (122)
- Korpus <Linguistik> (86)
- Wörterbuch (39)
- Kommunikation (31)
- Neologismus (29)
- Sprachgebrauch (27)
- Nationalsozialismus (26)
- Lexikografie (25)
- COVID-19 (24)
- Interaktion (22)
Publicationstate
- Veröffentlichungsversion (247)
- Zweitveröffentlichung (90)
- Postprint (35)
- (Verlags)-Lektorat (1)
- Ahead of Print (1)
Reviewstate
Publisher
- IDS-Verlag (81)
- de Gruyter (73)
- Leibniz-Institut für Deutsche Sprache (IDS) (36)
- V&R unipress (19)
- Wilhelm Fink (15)
- European Language Resources Association (ELRA) (9)
- Peter Lang (9)
- Leibniz-Institut für Deutsche Sprache (8)
- Winter (7)
- Cambridge University Press (5)
Die Untersuchung des Umgangs mit Klausuren in der Studieneingangsphase seitens internationaler Studierender stellt im Projekt Sprache und Studienerfolg bei Bildungsausländer/-innen (SpraStu) neben der Analyse des Mitschreibens in Vorlesungen eine zweite Annäherung an konkretes studientypisches Sprachhandeln dar. Ziel der überwiegend qualitativen Erhebungen rund um Klausuren in der Anfangsphase des Bachelorstudiums von Bildungsausländer:innen ist es hier, sich ein erstes Bild von subjektiv empfundenen Schwierigkeiten und von strategischen Vorgehensweisen bei der Klausurbearbeitung zu verschaffen; dazu wurden sowohl Dozierende als auch L2-Studierende in die Analysen einbezogen. In diesem Kapitel werden einige erste explorative qualitative Analysen der entsprechenden Daten präsentiert. Die Auswertungen beziehen sich auf zwei exemplarische Klausuren der Fächer Deutsch als Fremdsprache (Abschlussklausur zum Modul Lexikologie) und Wirtschaftswissenschaften (Klausur zur Vorlesung Bürgerliches Recht für Wirtschaftswissenschaftler (BGB)), die jeweils am Ende des ersten Studiensemesters geschrieben wurden, und auf mit sechs Bildungsausländer:innen durchgeführte Stimulated Recalls zu diesen Klausuren (vgl. Gass & Mackey, 2017; Heine & Schramm, 2016). Ferner werden Daten aus Interviews mit den Dozierenden ausgewertet, die für die beiden Klausuren verantwortlich waren. Die Analysen können also keinen Anspruch auf Generalisierbarkeit erheben, sondern illustrieren vielmehr einige exemplarische Hürden, die sich ganz spezifisch für L2-Studierende ergeben, aus deren subjektiver Sicht, und setzen sie ins Verhältnis zu den von den jeweiligen Dozierenden erwarteten Herausforderungen.
Studying the role of expertise in poetry reading, we hypothesized that poets’ expert knowledge comprises genre-appropriate reading- and comprehension strategies that are reflected in distinct patterns of reading behavior.
We recorded eye movements while two groups of native speakers (n=10 each) read selected Russian poetry: an expert group of professional poets who read poetry daily, and a control group of novices who read poetry less than once a month. We conducted mixed-effects regression analyses to test for effects of group on first-fixation durations, first-pass gaze durations, and total reading times per word while controlling for lexical- and text variables.
First-fixation durations exclusively reflected lexical features, and total reading times reflected both lexical- and text variables; only first-pass gaze durations were additionally modulated by readers’ level of expertise. Whereas gaze durations of novice readers became faster as they progressed through the poems, and differed between line-final words and non-final ones, poets retained a steady pace of first-pass reading throughout the poems and within verse lines. Additionally, poets’ gaze durations were less sensitive to word length.
We conclude that readers’ level of expertise modulates the way they read poetry. Our findings support theories of literary comprehension that assume distinct processing modes which emerge from prior experience with literary texts.
We report results from an exploratory study of college students’ conceptions of poetry in which we asked them to name three things they expect from a poem. Frequency- and list-based analyses of their responses revealed that they primarily expect poems to rhyme, but they also identified a number of form-, content-, and reception-related genre expectations, which we discuss in relation to relevant previous research. We propose that rhyme’s predominance in college students’ genre expectations reflects its perceptual and cognitive salience during incremental poetry comprehension rather than its frequency in contemporary poetic practice. Our results characterize the genre conceptions of the population that empirical studies of poetry comprehension typically investigate, and thus provide relevant background information for the interpretation of empirical
findings in this field.
We examined genre-specific reading strategies for literary texts and hypothesized that text categorization (literary prose vs. poetry) modulates both how readers gather information from a text (eye movements) and how they realize its phonetic surface form (speech production). We recorded eye movements and speech while college students (N = 32) orally read identical texts that we categorized and formatted as either literary prose or poetry. We further varied the text position of critical regions (text-initial vs. text-medial) to compare how identical information is read and articulated with and without context; this allowed us to assess whether genre-specific reading strategies make differential use of identical context information. We observed genre-dependent differences in reading and speaking tempo that reflected several aspects of reading and articulation. Analyses of regions of interests revealed that word-skipping increased particularly while readers progressed through the texts in the prose condition; speech rhythm was more pronounced in the poetry condition irrespective of the text position. Our results characterize strategic poetry and prose reading, indicate that adjustments of reading behavior partly reflect differences in phonetic surface form, and shed light onto the dynamics of genre-specific literary reading. They generally support a theory of literary comprehension that assumes distinct literary processing modes and incorporates text categorization as an initial processing step.
Träume - oder genauer: die berichteten und niedergeschriebenen Traumepisoden von Menschen, die während des Nationalsozialismus gelebt haben - lassen sich als Bestandteil von Kommunikation verstehen. Sie geben darüber hinaus oftmals Einblicke in die Art und Weise, wie von Kommunikation geträumt wurde. Traumerzählungen handeln also einerseits oft von Kommunikation, sie sind andererseits aber auch selbst als Kommunikation eingebunden in spezifische Kommunikationssituationen. Mit diesen Merkmalen sind sie ein lohnender Untersuchungsgegenstand für eine Kommunikationsgeschichte des Nationalsozialismus, die sich für kommunikative Praktiken der Hervorbringung, Aktualisierung und Infragestellung der nationalsozialistischen Gesellschaft interessiert. In welchen Situationen und Textsorten Menschen ihre Träume schilderten und welche Bedeutung sie ihnen verliehen, wie sie sich selbst zu ihren Träumen verhielten, lässt Interpretationen darüber zu, wie zur Zeit des Nationalsozialismus lebende Menschen etwas gemeinhin als zutiefst intim und persönlich Betrachtetes - ihre Träume - in den politischen Kommunikationsraum des Nationalsozialismus einbrachten.
Wie können Kinder und Jugendliche ihren mehrsprachigen Alltag im Mannheimer Vielfaltsquartier Neckarstadt-West erforschen – gemeinsam mit Forschenden des Leibniz-Instituts für Deutsche Sprache und seinen Kooperationspartnern, dem Campus Neckarstadt-West, der Alten Feuerwache Mannheim gGmbH und dem Verein Neckarstadt Kids e.V.?
Wir wollen die Potenziale von Citizen Science in einem sprachbezogenen Projekt ausloten:
- für die Etablierung vertrauensvoller Zusammenarbeit zwischen den jungen Citizen Scientists und der sprachwissenschaftlichen Forschung,
- für hochwertige Bildungsangebote im Sinne der UN-Nachhaltigkeitsziele und
- für neue Impulse im Bereich der Sprachkontakt- und Mehrsprachigkeitsforschung.
In diesem Beitrag skizzieren wir die Ziele, Fragen und Methoden unseres Projekts und geben Einblicke in die bisher durchgeführten und im Jahr 2023 geplanten Aktionen.
We present a simple tool for extracting text and markup information from printouts of (not only) scientific documents. While the heavy-lifting OCR is done by off-the-shelf tesseract, our focus is on detection, extraction, and basic categorization of color-highlighted text sections, as well as on providing a framework for downstream processing of extraction results. The tool can be useful for document analysis tasks that must, or benefit from being able to, use printed paper.
Simultandolmetschen ist eine komplexe und kognitive Aktivität, bei der verschiedene Prozesse gleichzeitig ablaufen. Neben monolingualer Textverarbeitung braucht man auch dolmetschspezifische Strategien, die erworben werden müssen. Die Notstrategien werden erst dann angewendet, wenn die Kapazitätsgrenze des Dolmetschers erreicht ist.
Seit Mitte der 1990er Jahre wird am Institut für deutsche Sprache (IDS) in Mannheim erforscht, wie der hochkomplexe Gegenstandsbereich „Grammatik“ unter Ausnutzung hypertextueller Navigationsstrukturen wissenschaftlich fundiert und anschaulich vermittelt werden kann. Eine zentrale Bedeutung kommt folglich einer konsistenten, theorieübergreifenden Vernetzung sämtlicher Textinhalte zu. Um eine automatisierbare Bezugnahme zwischen mit unterschiedlichem terminologischem Vokabular formulierten, aber das gleiche sprachliche Phänomen beschreibenden Inhalten zu befördern, bildet eine onomasiologisch konzipierte Terminologiedatenbank das Rückgrat des Online-Systems. Der Beitrag beschreibt Konzeption und Aufbau der skizzierten linguistischen Fachterminologie.
Widerstand als psychoanalytisches Konzept beschreibt die Ambivalenz von Psychotherapiepatient*innen gegenüber dem therapeutischen Veränderungsprozess. Während der*die Patient*in sich mit dem Wunsch, bestimmte Veränderungen zu erzielen, auf die Therapie einlässt, stellen sich diesem Wunsch unbewusste Kräfte entgegen, die versuchen, den Status quo aufrechtzuerhalten. Hintergrund ist die Annahme, dass Widerstand eine Schutzfunktion darstellt, um schmerzhafte Affekte abzuwehren, die integraler Bestandteil eines psychotherapeutischen Prozesses sind. Therapeut*innen sehen sich vor der Aufgabe, Widerstandsphänomene als solche zu erkennen, deren Funktion zu verstehen und einen gemeinsamen Verstehensprozess mit dem*der Patient*in zu ermöglichen. Eine gesprächsanalytische Untersuchung von Widerstand und dessen kommunikativer Bearbeitung bietet eine wertvolle Ergänzung zur psychotherapeutischen Betrachtungsweise. Ein bislang in der Literatur wenig beachtetes Widerstandsphänomen ist Verbosität, womit gemeinhin ausufernde, unfokussierte Erzählungen gemeint sind. Aufbauend auf der bisher einzigen gesprächsanalytischen Untersuchung zu Verbosität als Widerstandsphänomen von Fenner, Spranz-Fogasy und Montan (2022) ist das Ziel der vorliegenden Arbeit, herauszuarbeiten, wie Widerstandsmanagement bei Verbosität verwendet wird. Dafür werden zwei Fallbeispiele gesprächsanalytisch untersucht. Diese stammen aus einem Korpus 34 videographierter ambulanter psychodynamischer Therapiesitzungen. Anhand des ersten Fallbeispiels wird deutlich, dass Verbosität als Widerstandsphänomen nicht nur patient*innenseitig geäußert wird, sondern gemeinsam mit dem*der Therapeut*in interaktiv hergestellt und forciert werden kann. Das zweite Beispiel zeigt, wie Widerstandsmanagement zu einer Auflösung des Widerstands führen kann. Die Analysen verdeutlichen zum einen auch, dass der psychoanalytische Widerstandsbegriff aus gesprächsanalytischer Sicht kritisch zu betrachten ist und zum anderen, dass beide Disziplinen nicht unbedingt zu den gleichen Ergebnissen kommen.
Klassische Namen der Offline-Welt sind bei weitem umfangreicher erforscht als die eher kurzlebigen und auch noch sehr jungen Namen der digitalen Welt. Im vorliegenden Beitrag werden virtuelle Namen als eigene Namenklasse postuliert und unter Verweis auf bestehende Namentypologien verortet. Anschließend werden drei unterschiedliche Typen frei wählbarer virtueller Namen in Videospielen am Beispiel des populären Browserspiels ‚Forge of Empires‘ graphematisch und semantisch analysiert: Gilden-, Städte- und Benutzernamen. Hierfür werden drei Korpora mit je 100 Namen des jeweiligen Typs auf unterschiedliche Muster zunächst hinsichtlich Sprachwahl, Zeichenverwendung und graphematischen Besonderheiten untersucht. Anschließend erfolgt eine Untersuchung der den Namen zugrundeliegenden Benennungsmotive durch induktiv-explorative Kategorienbildung. Zwischen den untersuchten Namentypen kristallisiert sich in der Analyse ein funktionaler Unterschied heraus: Gildennamen priorisieren eine kommunikativ-phatische Funktion, wohingegen Benutzernamen primär Individualität ausdrücken. Städtenamen nehmen dabei eine Zwischenposition ein. Insgesamt fügen sich die verschiedenen Teilergebnisse in das Bild der bisherigen spärlichen Studien zur Namenwahl in Videospielen ein und rufen zugleich zur weiteren Erforschung auf.
Zum Verschmelzungsverhalten von definitem Artikel und Präposition in der Schriftsprache des Deutschen liegen bereits diverse Erkenntnisse vor, wohingegen die Kenntnislage für die gesprochene Sprache noch unzureichend ist. Die vorliegende Untersuchung widmet sich diesem Desiderat und analysiert Präposition-Artikel-Kombinationen anhand von Daten aus FOLK, um die linguistische Beschreibung dieser Struktur voranzutreiben. In der durchgeführten Korpusanalyse werden die Auftretenshäufigkeiten synthetischer und analytischer Präposition-Artikel-Kombinationen verglichen und Gebrauchsbesonderheiten auf syntaktisch-lexikalischer und pragmatischer Ebene herausgearbeitet.
The article analyzes communicative deviations that occur during the communication between German native speakers and non-native speakers, particularly Ukrainians. Despite existing intercultural and sociolinguistic studies, the analysis of language specificity that causes communicative deviations, failures and misunderstandings remains relevant and understudied. The purpose of this article is to identify and explore the German language peculiarities that cause misunderstandings in communication for non-native speakers, in particular Ukrainian speakers, and offer the algorithm for the representatives of different ethnic communities to help them avoid and resolve possible conflicts given the study of German as a foreign language. The status of the concept of communicative deviation in intercultural communication under conditions of insufficient communicative competence is determined in this article. The study uses the term communicative deviation in favor of a generalized term, a broad concept of linguistic, speech and communicative deviations in dialogic speech, in particular between native German speakers and non-native speakers. The empirical research was based on the speech activity of Ukrainian students during classes at the Department of German Studies and Translation (levels B2–C1) of Ivan Franko National University of Lviv in 2019–2021 academic years and definitions from the Universal Dictionary of German Duden, in addition to the materials reflected in textbooks and teaching manuals as well as from authentic German-language sources. Communicative deviations are identified and analyzed in phonological, lexical, syntactic and pragmatic aspects.
Kontrastiv-multilingual angelegte empirische Studien erfordern eine vergleichbare Datengrundlage. Je nachdem, welche Forschungsfragen im Zentrum der sprachvergleichenden Untersuchungen stehen, bieten sich entweder Parallelkorpora oder vergleichbare einzelsprachliche Korpora als Datengrundlage an. Dieser Beitrag verfolgt hauptsächlich das Ziel, die Herausforderungen aufzuzeigen, die die Arbeit mit vergleichbaren Korpora im multilingualen Sprachvergleich aufwirft. Dabei soll u.a. das Prinzip der Vergleichbarkeit von Korpora thematisiert und methodologische Vorschläge für konkrete empirisch angelegte sprachvergleichende Analysen vorgelegt werden. Die Möglichkeiten und Grenzen der empirisch basierten quantitativen und qualitativen Analysearbeit werden durch die Präsentation einiger exemplarischer Forschungsfragen und -ergebnisse aufgezeigt. Einige Desiderata für zukünftige korpusbasierte Studien auf der Basis von vergleichbaren Korpora im multilingualen Raum schließen den Beitrag ab.
Uneigentliches Reden, insbesondere die Schaffung und Verwendung von Metaphern und Metonymien, ist weit stärker sprachstrukturell lizenziert als es der kreativ-sprachspielerische Effekt vermuten lässt, der durch neue Tropen erzeugt wird. In diesem Beitrag wird es vor allem um das Konzept des paradigmatischen metaphorischen Musters gehen, dem zufolge die Wörter innerhalb eines Wortfelds ein ähnliches, auf abstrakten Merkmalen basierendes metaphorisches Potenzial entfalten. Dazu werde ich zunächst in Abschnitt 2 auf paradigmatische metonymische Muster eingehen, die in verschiedenen Kontexten und unter verschiedenen Bezeichnungen bereits häufiger untersucht wurden. In Abschnitt 3 werden grundlegende Überlegungen zur Metapher vorgestellt, und in Abschnitt 4 entwickle ich anhand verschiedener Beispiele das Konzept des metaphorischen Musters. In Abschnitt 5 wird der Zusammenhang zwischen metaphorischen Mustern und konzeptuellen Metaphern beleuchtet
The CLARIN Concept Registry (CCR) is the common semantic ground for most CMDI-based profiles to describe language-related resources in the CLARIN universe. While the CCR supports semantic interoperability within this universe, it does not extend beyond it. The flexibility of CMDI, however, allows users to use other term or concept registries when defining their metadata components. In this paper, we describe our use of schema.org, a light ontology used by many parties across disciplines.
In dem auf die Forschungsdaten sprach- und textbasierter Disziplinen ausgerichteten NFDI-Konsortium Text+ spielen Normdaten eine zentrale Rolle für die interoperable Beschreibung und semantische Verknüpfung von verteilten Datenquellen. Insbesondere die Gemeinsame Normdatei (GND) ist ein bedeutender Hub im Zentrum eines im Entstehen begriffenen, domänenübergreifenden Wissensgraphen. Diese Funktion soll im Rahmen von Text+ durch den Aufbau einer GND-Agentur für sprach- und textbasierte Forschungsdaten weiterentwickelt und ausgebaut werden. Ziel ist es, niedrigschwellige, qualitätsgesicherte Beteiligungsmöglichkeiten für Forschende zu schaffen und zugleich den Vernetzungsgrad der GND auch durch Terminologie-Mappings zu erweitern. Spezifische Anforderungen und Nutzungspraktiken werden hierbei anhand der Datendomänen von Text+ exemplifziert.
It was recently suggested in a study published in Nature Human Behaviour that the historical loosening of American culture was associated with a trade-off between higher creativity and lower order. To this end, Jackson et al. generate a linguistic index of cultural tightness based on the Google Books Ngram corpus and use this index to show that American norms loosened between 1800 and 2000. While we remain agnostic toward a potential loosening of American culture and a statistical association with creativity/order, we show here that the methods used by Jackson et al. are neither suitable for testing the validity of the index nor for establishing possible relationships with creativity/order.
In a previous study published in Nature Human Behaviour, Varnum and Grossmann claim that reductions in gender inequality are linked to reductions in pathogen prevalence in the United States between 1951 and 2013. Since the statistical methods used by Varnum and Grossmann are known to induce (seemingly) significant correlations between unrelated time series, so-called spurious or non-sense correlations, we test here whether the statistical association between gender inequality and pathogens prevalence in its current form also is the result of mis-specified models that do not correctly account for the temporal structure of the data. Our analysis clearly suggests that this is the case. We then discuss and apply several standard approaches of modelling time-series processes in the data and show that there is, at least as of now, no support for a statistical association between gender inequality and pathogen prevalence.
Sich selbst und andere politisch zu gesellschaftlichen Themen zu positionieren, ist eine elementare sprachliche und soziale Praxis. Ziel der Akademiekonferenz war es, zu verstehen, wie Positionierungen vollzogen werden, ob bzw. inwiefern sie politisch sind und in welchem wechselseitigen Zusammenhang sie zu gesellschaftlichen, sozialen und politischen Arrangements und Ordnungen stehen. Das Thema der politischen Positionierung wurde in sieben Panels durch unterschiedliche geistes- und sozialwissenschaftliche Disziplinen wie Linguistik, Soziologie, Geschichts-, Literatur- oder Politikwissenschaft bearbeitet. Die Fokussierung von sprachlichen Diskurspraktiken in diversen sozialen und politischen Zusammenhängen zog sich dabei als roter Faden durch die Beiträge.
Annotated dataset consisting of personal designations found on websites of 42 German, Austrian, Swiss and South Tyrolean cities. Our goal is to re-evaluate the websites every year in order to see how the use of gender-fair language develops over time. The dataset contains coordinates for the creation of map material.
Seit der Forschung große Datenmengen und Rechenkapazitäten zur Verfügung stehen arbeitet auch die Sprachwissenschaft zunehmend datengeleitet. Datengeleitete Forschung geht nicht von einer Hypothese aus, sondern sucht nach statistischen Auffälligkeiten in den Daten. Sprache wird dabei oft stark vereinfacht als lineare Abfolge von Wörtern betrachtet. Diese Studie zeigt erstmals, wie der zusätzliche Einbezug syntaktischer Annotationen dabei hilft, sprachliche Strukturen des Deutschen besser zu erfassen.
Als Anwendungsbeispiel dient der Vergleich der Wissenschaftssprachen von Linguistik und Literaturwissenschaft. Die beiden Fächer werden oft als Teildisziplinen der Germanistik zusammengefasst. Ihre wissenschaftliche Praxis unterscheidet sich jedoch systematisch hinsichtlich Forschungsdaten, Methoden und Erkenntnisinteressen, was sich auch in den Wissenschaftssprachen niederschlägt.
Words originating from shortening, including acronyms and clippings, constitute a treasure trove of insight into phonological grammar. In particular, they serve as an ideal testing ground for Optimality Theory (OT) and its view of grammar as an interaction of markedness constraints, which express (dis-) preferences regarding phonological structure in output forms, and faithfulness constraints, which require output forms to correspond to input structure (Prince and Smolensky 1993). This is because shortenings are characterised by a sharply diminished role of faithfulness, allowing for markedness constraints to make their force felt (“The Emergence of the Unmarked”). This article aims to demonstrate the heuristic value of shortening data for testing the OT model and for shedding light on various controversies in German phonology. A particular concern is to draw attention to the need for properly sorting the shortening data, to identify influences on phonological structure due to internal domain boundaries or to special correspondence effects potentially obscuring the view on the maximally unmarked patterns.
Comprehending conditional statements is fundamental for hypothetical reasoning about situations. However, the online comprehension of conditional statements containing different conditional connectives is still debated. We report two self-paced reading experiments on German conditionals presenting the conditional connectives wenn (‘if’) and nur wenn (‘only if’) in identical discourse contexts. In Experiment 1, participants read a conditional sentence followed by the confirmed antecedent p and the confirmed or negated consequent q. The final, critical sentence was presented word by word and contained a positive or negative quantifier (ein/kein ‘one/no’). Reading times of the two quantifiers did not differ between the two conditional connectives. In Experiment 2, presenting a negated antecedent, reading times for the critical positive quantifier (ein) did not differ between conditional connectives, while reading times for the negative quantifier (kein) were shorter for nur wenn than for wenn. The results show that comprehenders form distinct predictions about discourse continuations due to differences in the lexical semantics of the tested conditional connectives, shedding light on the role of conditional connectives in the online interpretation of conditionals in general.
The question of whether a letter is a grapheme or not is a perennial issue in writing research. The answer depends on which criteria are used to differentiate between letters and graphemes and, ultimately,how the unit ‘grapheme’ is defined. This problem is particularly relevant to complex graphemes, i.e. sequences of letters that behave like a single grapheme in certain respects. Typical for German is the ‹ch›. This paper argues for a scalar concept of graphemes, which compares the grapheme status of each of the units under investigation. For this purpose, new criteria for the identification of complex graphemes are used, which originate from handwriting analysis. There, it is shown that complex graphemes are connected with each other disproportionately often and also have deviating letter forms disproportionately often.
In contrast to printed letters, handwritten texts show a larger amount of variation regarding letter shape and letter contact. This variation though might not be totally random but could follow a certain grammatical or structural function. By analysing a corpus of 10.117 graphs written by four writers, this paper explores which structures and which functions correlate. More precisely, it will be shown that the shape of certain letters might indicate syllabic, morphologic od prosodic structures. In addition, it will be shown that handwritten texts present the words’ structure better than printed texts could do. Overall, this paper points out how handwritten scripts show the graphematic principles known from printing even better than printed texts do.
So far, Sepedi negations have been considered more from the point of view of lexicographical treatment. Theoretical works on Sepedi have been used for this purpose, setting as an objective a neat description of these negations in a (paper) dictionary. This paper is from a different perspective: instead of theoretical works, corpus linguistic methods are used: (1) a Sepedi corpus is examined on the basis of existing descriptions of the occurrences of a relevant verb, looking at its negated forms from a purely prescriptive point of view; (2) a "corpus-driven" strategy is employed, looking only for sequences of negation particles (or morphemes) in order to list occurring constructions, without taking into account the verbs occurring in them, apart from their endings. The approach in (2) is only intended to show a possible methodology to extend existing theories on occurring negations. We would also like to try to help lexicographers to establish a frequency-based order of entries of possible negation forms in their dictionaries by showing them the number of respective occurrences. As with all corpus linguistic work, however, we must regard corpus evidence not as representative, but as tendencies of language use that can be detected and described. This is especially true for Sepedi, for which only few and small corpora exist. This paper also describes the resources and tools used to create the necessary corpus and also how it was annotated with part of speech and lemmas. Exploring the quality of available Sepedi part-of-speech taggers concerning verbs, negation morphemes and subject concords may be a positive side result.
Hier sehen Sie neue Wörter sowie bekannte Wörter mit neuen Bedeutungen, die seit Beginn der COVID-19-Pandemie aufgekommen sind, bei denen wir aber noch beobachten, ob sie eine gewisse Verbreitung in die Allgemeinsprache erfahren werden. Zu jedem dieser Wörter geben wir eine (vorläufige, grobe) Bedeutungserläuterung an und illustrieren die Verwendung mit 1-2 Belegen.
In this paper, the author studies the role of the dictionary in the first language acquisition, highlighting its didactic value. Based on two Romanian lexicographical works of the 19th century, Lexiconul de la Buda (Buda, 1825) [the Lexicon of Buda] et Vocabularu romano-francesu (Bucarest, 1870) [the Romanian-French Vocabulary], the author analyses the normative information recorded in the articles in order to observe which level of language (i. e. phonetical, morphological, syntactical and lexical) is concerned. Such an approach allows to distinguish between the possible changings both at the level of the perception or at the grammatical, lexical and semantical description, i. e. the settlement of the word in the first language, and at a technical level, i. e. the making of article and of dictionary.
This paper presents the decisions behind the design of a maths dictionary for primary school children. We are aware that there has been a considerable problem regarding Mexican children’s performance in maths dragging on for a long time, and far from getting better, it is getting worse. One of the probable causes seems to be the lack of coordination between maths textbooks and teaching methods. Most maths textbooks used in primary schools include lots of activities and problem-solving techniques, but hardly any conceptual information in the form of definitions or explanations. Consequently, many children learn to do things, but have difficulty understanding mathematical concepts and applying them in different contexts. To help solve this problem, at least partially, the project of the dictionary was launched aiming at helping children to grasp and understand maths concepts learned during those first six years of their formal education. The dictionary is a corpus-based terminographical product whose macrostructure, microstructure, typography, and additional information were specifically designed to help children understand mathematical concepts.
This paper deals with the lexicographic treatment of the evidently plenty and pervasive scatological vocabulary, that is vocabulary concerning the process and products of bodily excretion (especially feces), in the synchronic Early New High German Dictionary (FWB = Frühneuhochdeutsches Wörterbuch) from a dictionary user’s view. Initially, different cultural concepts of scatology by Norbert Elias, Michail Bachtin and Mary Douglas among others and the term taboo are reflected. Subsequently, selected lexical items such as words with a primary scatological meaning (e. g. drek, kot, scheisse), concealing expressions (euphemisms, periphrases, metaphors, e. g. sitzen, seine notdurft tun, bauernveiel), and certain aspects within the polysemy of the verb scheissen are discussed, the latter on the one hand referring to a physical process with uncontrollable aspects and on the other hand denoting a deliberate action and functionalized as a fighting word during the reformation. Focussing on different positions of lexicographical information within the microstructure of the FWB, the surveillance shows that in a synchronic perspective Early New High German scatological vocabulary is a heterogeneous and complex phenomenon due to speaker, context and respectively semantic and pragmatic purposes
To effectively design online tools and develop sophisticated programs, for the teaching of Ancient Greek language, there is a clear need for lexical resources that provide semantic links with Modern Greek. This paper proposes a microstructure for an online Ancient Greek to Modern Greek thesaurus (AMGthes) that serves educational purposes. The terms of this bilingual thesaurus have been selected from reference Ancient Greek texts, taught and studied during lower and upper secondary education in Greece. The main objective here is to build a semantic map that helps students find relevant and semanti- cally related terms (synonyms and antonyms) in Ancient Greek, and then provide a rich set of suitable translations and definitions in Modern Greek. Designed to be an online resource, the thesaurus is being developed using web technologies, and thus will be available to every school and university student that pursues a degree in digital humanities.
The paper presents the results of empirical research conducted with students from the Faculty of Translation studies of Ventspils University of Applied Sciences (VUAS) in Latvia. The study investigates the habits and practices concerning the use of dictionaries on the part of translation students, as well as types of dictionaries used, frequency of use, etc. The study also presents an insight into the evaluation of the usefulness of dictionaries by Latvian students. The research describes the advantages and disadvantages of dictionaries used by the respondents, the importance of the preface and the explanation of the terms and abbreviations used in dictionaries. The research conducted, as well as the insights, results and recommendations presented, will be relevant for the lexicographic community, as it reflects the experience of one Latvian University to improve the teaching of dictionary use and lexicographic culture in this country and to complement dictionary use research with the Latvian experience.
Learning from students. On the design and usability of an e-dictionary of mathematical graph theory
(2022)
We created a prototype of an electronic dictionary for the mathematical domain of graph theory. We evaluate our prototype and compare its effectiveness in task-based tests with that of Wikipedia. Our dictionary is based on a corpus; the terms and their definitions were automatically extracted and annotated by experts (cf. Kruse/Heid 2020). The dictionary is bilingual, covering German and English; it gives equivalents, definitions and semantically related terms. For the implementation of the dictionary, we used LexO (Bellandi et al. 2017). The target group of the dictionary are students of mathematics who attend lectures in German and work with English resources. We carried out tests to understand which items the students search for when they work on graph-theoretical tasks. We ran the same test twice, with comparable student groups, either allowing Wikipedia as an information source or our dictionary. The dictionary seems to be especially helpful for students who already have a vague idea of a term because they can use the resource to check if their idea is right.
This paper describes the results of an empirical investigation carried out within the project Lessico Multilingue dei Beni Culturali (LBC), whose aim is to create a multilingual online dictionary of the lexicon of the Italian artistic heritage. The dictionary, whose lexicographic process has already started, is intended for linguists and specialist translators as well as for professionals in the tourism sector and students of Foreign Languages and Literatures. The investigation conducted through a questionnaire submitted to undergraduate students at the University of Milan and at the University of Florence has a double aim: to research the habits in the use of lexicographic tools by possible users of the dictionary (Italian Learners of German Language), and to identify preferences regarding macro-, medio- and microstructural features of the future LBC-dictionary to realize a user-friendly tool. After a brief introduction on the state of the art of the survey in the field of Dictionary Users Studies, the article describes the questionnaire and the results obtained from the pilot study. A summary and a discussion on the future developments of the project conclude the work.
This paper gives an insight into a cross-media publishing process on different stages: from a printed bilingual syntagmatic dictionary for GFL to an online learner’s dictionary of German collocations to a German learner’s dictionary portal. On the basis of an sql database specially developed for a corpus-guided dictionary of German collocations, the bilingual syntagmatic learner’s dictionary KolleX was published in 2014. The first part of the article describes this lexicographic process, focusing the most relevant aspects of the dictionary concept, e. g. dictionary type, subject matter, corpus guided data selection and microstructure. The second part introduces the first online version of KolleX from 2016 and the profound changes in the editing system – from a desktop version (2005) to a web-based editing system (2016) –, which resulted successively in a prototype of a German learner’s dictionary portal, called E-KolleX DaF (2018–). Focusing on the aspects of dynamism and integration of different resources from a learner’s perspective the paper shows the innovative features of this new online reference work. The contribution presents the solutions for the integration of new datatypes in the database of KolleX and the linking to different data in German monolingual dictionary platforms. The paper outlines the web design, functioning and technical improvements of E-KolleX DaF. The conclusions provide an outlook to the forthcoming challenges.
Das Ziel des Beitrags ist es, die Merkmale von Kommunikationsstörungen in Sport-Interviews aus Sicht der Interviewten festzustellen und zu analysieren. Die empirische Forschungsbasis besteht aus ukrainisch- und deutschsprachigen Videointerviews aus den Jahren 2010 bis 2019, die entweder im Fernsehen gesendet oder für YouTube produziert wurden. Die Ergebnisse der Studie ermöglichten es, die charakteristischen Merkmale von Abweichungen als Kommunikationsstörungen in Sport-Interviews auf drei Ebenen der kommunikativen Gattung zu identifizieren: auf der außenstrukturellen, binnenstrukturellen und situativen Ebene. Sowohl gemeinsame Merkmale von Kommunikationsstörungen als auch Unterschiede in den ukrainisch- und deutschsprachigen Sport-Interviews wurden bestimmt. Die Ergebnisse der Studie zeigen, dass die Arten von Kommunikationsstörungen in Sport-Interviews im Ukrainischen und Deutschen universell sind, sie spiegeln jedoch die nationalen und kulturellen Besonderheiten angesichts der Merkmale beider Sprachen und jeder Sprachkultur wider.
Dieser Beitrag beschreibt die Motivation und Ziele hinter der Initiative Europäisches Referenzkorpus EuReCo. Ausgehend von den Desiderata, die sich aufgrund der Defizite verfügbarer Forschungsdaten wie monolinguale Korpora, Parallelkorpora und Vergleichskorpora für den Sprachvergleich ergeben, werden die bisherigen und die laufenden Arbeiten im Rahmen von EuReCo präsentiert und anhand vergleichender deutsch-rumänischer Kookkurrenzanalysen neue Perspektiven für kontrastive Korpuslinguistik, die die EuReCo-Initiative öffnet, skizziert.
Kontrastive Korpuslinguistik versteht sich als eine Bezeichnung für sprachvergleichende Studien, deren Ergebnisse mit Analysen sprachlicher Daten erreicht und empirisch fundiert sind. Die Bezeichnung contrastive corpus linguistics für eine neue, sich entwickelnde Disziplin wurde 1996 von Karin Aijmer und Bengt Altenberg (Schmied 2009: 1142) eingeführt. Der Einsatz der sprachlichen Korpora bei der Beschreibung kontrastiver Studien bedeutet in den 1990er-Jahren für die kontrastive Linguistik eine Wiederbelebung, nachdem die weit gesteckten Ziele und Hoffnungen in den 50er- und 60er-Jahren, die mit der Fremdsprachendidaktik zusammenhingen, vor etwa 50 Jahren aufgegeben wurden.
Kontrastive Korpuslinguistik
(2022)
Every Regional Dossier begins with an introduction about the region in question, followed by six chapters that each deal with a specific level of the education system (e.g. primary education). Chapters 8 and 9 cover the main lines of research into education of the minority language under discussion, and the prospects for the minority language in general and in education in particular, respectively. Chapter 10 provides a summary of statistics. Lists of (legal) references and useful addresses regarding the minority language are given at the end of the dossier.
There is a growing interest in pedagogical lexicography, and more specifically in the study of dictionary users’ abilities and strategies (Prichard 2008; Gavriilidou 2010, 2011; Gavriilidou/Mavrommatidou/Markos 2020; Gavriilidou/Konstantinidou 2021; Chatjipapa et al. 2020). Τhe purpose of this presentation is to investigate dictionary use strategy and the effect of an explicit and integrated dictionary awareness intervention program on upper elementary pupils’ dictionary use strategies according to gender and type of school. A total of 150 students from mainstream and intercultural schools, aged 10–12 years old, participated in the study. Data were collected before and after the intervention through the Strategy Inventory for Dictionary Use (SIDU) (Gavriilidou 2013). The results showed a significant effect of the intervention program on Dictionary Use Strategies employed by the experimental group and support the claim that increased dictionary use can be the outcome of explicit strategy instruction. In addition, the effective application of the program suggests that a direct and clear presentation of DUS is likely to be more successful than an implicit presentation. The present study contributes to the discussion concerning both the ‘teachability’ of dictionary use strategies and skills and the effective forms of intervention programs raising dictionary use awareness and culture.
In this paper, we propose a controlled language for authoring technical documents and report the status of its development, while maintaining a specific focus on the Japanese automotive domain. To reduce writing variations, our controlled language not only defines approved and unapproved lexical elements but also prescribes their preferred location in a sentence. It consists of components of a) case frames, b) case elements, c) adverbial modifiers, d) sentence-ending functions, and e) connectives, which have been developed based on the thorough analyses of a large-scale text corpus of automobile repair manuals. We also present our prototype of a writing assistant tool that implements word substitution and reordering functions, incorporating the constructed controlled language.
The focus of this paper will be on lexical information systems and the framework guidelines for the definition of the curricula within the educational system of the Autonomous Province of Bolzano/ Bozen (Italy). In Italy, the competences to be achieved at different school levels are published in the form of general guidelines. On this basis each school has to specify the general competency goals and to spell them out in a concrete curriculum. In this paper I will examine to what extent lexical information systems are represented in the framework guidelines within the German and the Italian educational system of the Autonomous Province, these being separate systems. In a second step, I will check the representations of the resources against the “Villa Vigoni Theses on Lexicography“. Finally, I will discuss the results and give an outlook for further research.
Thesauri have long been recognized as valuable structured resources aiding Information Retrieval systems. A thesaurus provides a precise and controlled vocabulary which serves to coordinate data indexing and retrieval. The paper presents a bilingual Greek and English specialized thesaurus that is being developed as the backbone of a platform aimed at enhancing and enriching the cultural experiences of visitors in Eastern Macedonia and Thrace, Greece. The cultural component of the intended platform comprises textual data, images of artifacts and living entities (animals and plants in the area), as well as audio and video. The thesaurus covers the domains of Archaeology, Literature, Mythology, and Travel; therefore, it can be viewed as a set of inter-linked thesauri. Where applicable, terms and names in the database are also geo-referenced.
Lexicographers working with minority languages face many challenges. When the language in question is also a sign language, circumstances specific to the visual-spatial modality have to be taken into consideration as well. In this paper, we aim to show and discuss which challenges we encounter while compiling the Digitales Wörterbuch der Deutschen Gebärdensprache (DW-DGS), the first corpus-based dictionary of German Sign Language (DGS). Some parallel the challenges minority language lexicographers of spoken languages encounter, e. g. few resources, no written tradition, and having to create one dictionary for all potential user groups, while others are specific to sign languages, e. g. representation of visual-spatial language and creating access structures for the dictionary.
Dieser Beitrag möchte einen Überblick über die Rolle geben, die die Regionalsprache Lettgallisch im Bildungssektor im Baltikum spielt. Zum einen soll in groben Zügen die historische gesellschaftliche Entwicklung des Lettgallischen mit einem Schwerpunkt auf dem Bildungsbereich dargestellt werden, zum anderen werden Entwicklungen der letzten Jahre diskutiert, in denen Diskurse und Einstellungen zum Lettgallischen eine Wandlung durchlaufen. Der theoretische Rahmen dafür sind internationale Diskussionen zu Regional- und Minderheitensprachen sowie Debatten in der Bildungspolitik. Damit soll nicht zuletzt Aufmerksamkeit für das Lettgallische in der deutschsprachigen Wahrnehmung des Baltikums generiert werden, das in einem Kompendium zu Bildungsgeschichte(n) im Baltikum nicht fehlen darf. Nach einer kurzen Einführung in die Region Lettgallen (Latgale) und das Lettgallische folgen aktuelle Beispiele für den sich ändernden Gebrauch des Lettgallischen und seine Einordung in Diskurse zu Minderheitensprachen. Schließlich wird auf jüngste politische Entwicklungen eingegangen, etwa im Kontext der Ausarbeitung neuer Lehrstandards für die staatlichen Schulen in Lettland.
The EMLex Dictionary of Lexicography (= EMLexDictoL) is a plurilingual subject field dictionary (in German, English, Afrikaans, Galician, Italian, Polish and Spanish) that contains the basic subject field terminology of lexicography and dictionary research, in which the dictionary article texts are presented in a sophisticated but comprehensible form. The articles are supplemented by a complex crossreferencing system and the current subject field literature of the respective national languages. Following the lemma position, the dictionary articles contain items regarding morphology, synonymy, the position of the definiens, additional explanations, the cross-reference position, the position for literature, the equivalent terms in the other six languages of the dictionary as well as the names of the authors.
This paper focusss on the first Slavonic-Romanian lexicons, compiled in the second half of the 17th century and their use(rs), proposing a method of investigating the manner in which lexical information available in the above corpus relates, if at all, to the vocabulary of texts from the same period. We chose to investigate their relation to an anonymous Old Testament translation made from Church Slavonic, also from the second half of the 17th century, which was supposed to be produced in the same geographical area, in the same Church Slavonic school or even by the same author as the lexicons. After applying a lemmatizer on both the Biblical text (Books of Genesis and Daniel) and the Romanian material from the lexicons, we analyse the results and double the statistical analysis with a series of case studies, focusing on some common lexemes that might be an indicator of the relatedness of the texts. Even if the analysis points out that the lexicons might not have been compiled as a tool for the translation of religious texts, it proves to be a useful method that reveals interesting data and provides the basis for more extensive approaches.
Given the relevance of interoperability, born-digital lexicographic resources as well as legacy retro-digitised dictionaries have been using structured formats to encode their data, following guidelines such as the Text Encoding Initiative or the newest TEI Lex-0. While this new standard is being defined in a stricter approach than the original TEI dictionary schema, its reuse of element names for several types of annotation as well as the highly detailed structure makes it difficult for lexicographers to efficiently edit resources and focus on the real content. In this paper, we present the approach designed within LeXmart to facilitate the editing of TEI Lex-0 encoded resources, guaranteeing consistency through all editing processes.
An ongoing academic and research program, the “Vocabula Grammatica” lexicon, implemented by the Centre for the Greek Language (Thessaloniki, Greece), aims at lemmatizing all the philological, grammatical, rhetorical, and metrical terms in the written texts of scholars (philologists and scholiasts) who curated the ancient Greek literature from the beginning of the Hellenistic period (4th/3rd c. BC) until the end of the Byzantine era (15th c. AD). In particular, it aspires to fill serious gaps (a) in the study of ancient Greek scholarship and (b) in the lexicography of the ancient Greek language and literature. By providing specific examples, we will highlight the typical and methodological features of the forthcoming dictionary.
Basnage’s revision (1701) of Furetiere’s Dictionnaire universel is profoundly different from Furetiere’s work in several regards. One of the most noticeable features of the dictionary lies in his in- creased use of usage labels. Although Furetiere already made use of usage labels (see Rey 1990), Basnage gives them a prominent role. As he states in the preface to his edition, a dictionary that aspires to the title of “universal” should teach how to speak in a polite way (“poliment”), right (“juste”) and making use of specific terminology for each art. He specifies, lemma by lemma, the diaphasic dimension by indicating the word’s register and context of use, the diastratic one by noting the differences in the use of the language within the social strata, the diachronic evolution by indicating both archaisms and neologisms, the diame- sic aspect by highlighting the gaps between oral and written language, the diatopic one by specifying either foreign borrowings or regionalisms.
After extracting the entries containing formulas such as “ce mot est...”, “ce terme est...” and similar ones, we compare the number of entries and the type of information provided by the two lexicographers1. In this paper, we will focus on Basnage’s innovative contribution. Furthermore, we will try to identify the lexi- cographer’s sources, i. e. we will try to establish on which grammars, collections of linguistic remarks or contemporary dictionaries Basnage relies his judgements.
Wortgeschichte digital (‘digital word history’) is a new historical dictionary of New High German, the most recent period of German reaching from approximately 1600 AD up to the present. By contrast to many historical dictionaries, Wortgeschichte digital has a narrated text – a “word history” – at the core of its entries. The motivation for choosing this format rather than traditional microstructures is
briefly outlined. Special emphasis it put on the way these word histories interact with other components of the dictionary, notably with the quotation section. As Wortgeschichte digital is an online only project, visualizations play an important role for the design of the dictionary. Two examples are presented: first, the “quotation navigator” which is relevant for the microstructure of the entries, and, second, a timeline (“Zeitstrahl”) which is part of the macrostructure as it gives access to the lemma inventory from a diachronic point of view.
Within a rapidly digitalising society, it is important to understand how the learning and teaching of digital skills play out in situ, particularly amongst older adults who acquire these skills later in life. This paper focuses on participants engaged in the process of learning digital skills in adult education courses. Using video recordings from adult education centres in Finland and Germany, we explore how students mobilise their teachers’ assistance when encountering problems with their smartphones, laptops or tablets. Prior research on social interaction has shown that assistance can be recruited through a variety of verbal and embodied formats. In this specific educational setting, participants can use complaints about their digital skills or mobile devices to obtain assistance. Utilising multimodal conversation analysis, we describe two basic sequence types involving students’ complaints, discuss their cross-linguistic characteristics, and reflect on their connection to this educational setting and digital devices.
The QUEST (QUality ESTablished) project aims at ensuring the reusability of audio-visual datasets (Wamprechtshammer et al., 2022) by devising quality criteria and curating processes. RefCo (Reference Corpora) is an initiative within QUEST in collaboration with DoReCo (Documentation Reference Corpus, Paschen et al. (2020)) focusing on language documentation projects. Previously, Aznar and Seifart (2020) introduced a set of quality criteria dedicated to documenting fieldwork corpora. Based on these criteria, we establish a semi-automatic review process for existing and work-in-progress corpora, in particular for language documentation. The goal is to improve the quality of a corpus by increasing its reusability. A central part of this process is a template for machine-readable corpus documentation and automatic data verification based on this documentation. In addition to the documentation and automatic verification, the process involves a human review and potentially results in a RefCo certification of the corpus. For each of these steps, we provide guidelines and manuals. We describe the evaluation process in detail, highlight the current limits for automatic evaluation and how the manual review is organized accordingly.
Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.
Sometimes in interaction, a speaker articulates an overt interpretation of prior talk. Such moments have been studied as involving the repair of a problem with the other’s talk or as formulating an understanding of the matter at hand. Stepping back from the established notions of formulations and repair, we examine the variety of actions speakers do with the practice of offering an interpretation, and the order within this domain. Results show half a dozen usage types of interpretations in mundane interaction. These form a largely continuous territory of action, with recognizably distinct usage types as well as cases falling between these (proto)typical uses. We locate order in the domain of interpretations using the method of semantic maps and show that, contrary to earlier assumptions in the literature, interpretations that formulate an understanding of the matter at hand are actually quite pervasive in ordinary talk. These findings contribute to research on action formation and advance our understanding of understanding in interaction. Data are video- and audio-recordings of mundane social interaction in the German language from a variety of settings.
In this paper, we deal with register-driven variation from a probabilistic perspective, as proposed in Schäfer, Bildhauer, Pankratz, Müller (2022). We compare two approaches to analyse this variation within HPSG. On the one hand, we consider a multiple-grammar approach and combine it with the architecture proposed in the CoreGram project Müller (2015) - discussing its advantages and disadvantages. On the other hand, we take into account a single-grammar approach and argue that it appears to be superior due to its computational efficiency and cognitive plausibility.
Close repetitions of lexical material can create an impression of clumsiness in the style of Italian prose, while they seem to be accepted with more ease in German. The present study shows that this traditional claim needs some further differentiation. The negative effects on style take place in Italian when informationally prominent words are repeated, while informational background material may - and in certain cases even must - be repeated for clarity. The comparative study investigates lexical, syntactic and prosodic resources for indicating adversative (contrast) relations in argumentative texts from the field of humanities, written in Italian and German. It shows that, for encoding this kind of relation, Italian depends very much on lexical resources, including repetitions of words, while German makes more use of syntactic and prosodic parallelism. As a consequence, German can often dispense with adversative connectives and allows to employ word repetitions for different purposes.
Das Lehnwortportal Deutsch (LWPD) ist ein Online-Informationssystem zu Entlehnungen von Wörtern aus dem Deutschen in andere Sprachen. Es beruht auf einer wachsenden Zahl von lexikographischen Ressourcen zu verschiedenen Sprachen und bietet eine einfache ressourcenübergreifende Suchfunktion an. Das Poster präsentiert eine derzeit in Entwicklung befindliche onomasiologische Suchfunktion für das LWPD.
Since the beginning of the Covid-19 pandemic, about 2000 new lexical units have entered the German lexicon. These concern a multitude of coinings and word formations (Kuschelkontakt, rumaerosolen, pandemüde) as well as lexical borrowings mainly from English (Lockdown, Hotspot, Superspreader). In a special way, these neologisms function as keywords and lexical indicators sketching the development of the multifaceted corona discourse in Germany. They can be detected systematically by corpus-linguistic investigations of reports and debates in contemporary public communication. Keyword analyses not only exhibit new vocabulary, they also reveal discursive foci, patterns of argumentation and topicalisations within the diverse narratives of the discourse. With the help of quickly established and dominant neologisms, this paper will outline typical contexts and thematic references, but it will also identify speakers' attitudes and evaluations.
Vorgestellt wird das Korpus deutschsprachiger Songtexte als innovative Sprachdatenquelle für interdisziplinäre Untersuchungsszenarien und speziell für den Einsatz im Fremd- und Zweitsprachenunterricht. Die Ressource dokumentiert Eigenschaften konzeptioneller Schriftlichkeit und konzeptioneller Mündlichkeit und erlaubt empirisch begründete Analysen sprachlicher Phänomene bzw. Tendenzen in den Texten moderner Popmusik. Vorgestellt werden Design, Annotationen und Anwendungsbeispiele des in thematische und autorenspezifische Archive stratifizierten Korpus.
In this article we examine moments in which parents or other caregivers overtly invoke rules during episodes in which they take issue with, intervene against, and try to change a child’s ongoing behavior or action(s). Drawing on interactional data from four different languages (English, Finnish, German, Polish) and using Conversation Analytic methods, we first illustrate the variety of ways in which parents may use such overt rule invocations as part of their behavior modification attempts, showing them to be functionally versatile interactional objects. Their interactional flexibility notwithstanding, we find that parents typically invoke rules when, in the course of the intervention episode, they encounter trouble with achieving an acceptable compliant outcome. To get at the distinct import of rule formulations in this context, we then compare them to two sequential alternatives: parental expressions of an experienced negative affective state, and parental threats. While the former emphasize aspects of social solidarity, the latter seek to enforce compliance by foregrounding a power asymmetry between the parent and the child. Rule formulations, by contrast, are designedly impersonal and appear to be directed at what the parents construe as shortcomings in common-sense practical reasoning on the child’s part. Reflexively, the child is thereby cast as not having properly applied common-sense ‘practical reason’ when engaging in what is treated as the problematic behavior or action. Overt rule invocations can, therefore, be understood as indexical appeals to practical reason.
In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the biggest obstacle is the lack of machine readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment – OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now – OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many opensource and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are not efficient enough. This paper presents the OCR software called “SCyDia”, developed to overcome this issue. We demonstrate the organizational structure of the OCR software “SCyDia” and the first results. The “SCyDia” is a web-based software solution that relies on the open-source software “Tesseract” in the background. “SCyDia” also contains a module for semi-automatic text correction. We have already processed over 15,000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the “SCyDia” by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
This paper deals with different types of verbal complementation of the German verb verdienen. It focuses on constructions that have been undergoing a grammaticalization process and thus express deontic modality, as in Sie verdient geliebt zu werden (ʽShe deserves to be lovedʼ) and Sie verdient zu leben (ʽShe deserves to liveʼ) (Diewald, Dekalo & Czicza 2021). These constructions are connected to parallel complementation types with passive and active infinitives containing a correlate es, as in Sie verdient es, geliebt zu werden and Sie verdient es, zu leben, as well as finite clauses with the subordinator dass with and without correlative es, as in Sie verdient, dass sie geliebt wird and Sie verdient es, dass sie geliebt wird. This paper attempts to show a close comparative investigation of these six types of constructions based on their relevant semantic and syntactic properties in terms of clause linkage (Lehmann 1988). We analyze the relevant data retrieved from the DWDS corpus of the 20th century and present an expanded grammaticalization path for verdienen-constructions. The finite complementation with dass is regarded as an example of a separate structural option called “elaboration”. Concerning the use of correlative es, it is shown that it does not have any substantial effect on the grammaticalization of modal verdienen-constructions.
Lexical data API
(2022)
This API provides data from various dictionary resources of K Dictionaries across 50 languages. It is used by language service providers, app developers, and researchers, and returns data as JSON documents. A basic search result consists of an object containing partial lexical information on entries that match the search criteria, but further in-depth information is also available. Basic search parameters include the source resource, source language, and text (lemma), and the entries are returned as objects within the results array. It is possible to look for words with specific syntactic criteria, specifying the part of speech, grammatical number, gender and subcategorization, monosemous or polysemous entries. When searching by parameters, each entry result contains a unique entry ID, and each sense has its own unique sense ID. Using these IDs, it is possible to obtain more data – such as syntactic and semantic information, multiword expressions, examples of usage, translations, etc. – of a single entry or sense. The software demonstration includes a brief overview of the API with practical examples of its operation.
Almanca tuhfe / Deutsches Geschenk (1916) oder: Wie schreibt man deutsch mit arabischen Buchstaben?
(2022)
Versified dictionaries are bilingual/multilingual glossaries written in verse form to teach essential words in any foreign language. In Islamic culture, versified dictionaries were produced to teach the Arabic language to the young generations of Muslim communities not native in Arabic. In the course of time, many bilingual/multilingual versified dictionaries were written in different languages throughout the Islamic world. The focus of this study is on the Turkish-German versified dictionary titled Almanca Tuhfe / Deutsches Geschenk [German Gift], published by Dr. Sherefeddin Pasha in Istanbul in 1916. This dictionary is the only dictionary in verse ever written combining these two languages. Moreover the dictionary is one of the few texts containing German words written in Arabic letters (applying Ottoman spelling conventions). The study concentrates on the way German words are spelled and tries to find out, whether Sherefeddin Pasha applied something like fixed rules to write the German lexemes.
This article aims to show the influence of doctrines in the medical lexicographers choices, with the Capuron-Nysten-Littré lineage as a case study. Indeed, the Dictionnaire de médecine has been crossed by several schools of thought such as spiritualism and positivism. While lexical continuity may seem self-evident due to the nature of the work, thus reducing the reprint to a simple lexical increase, this process introduces neologisms and deletions, all can be considered in their effects by using text statistics and factorial analysis.
In the present contribution, I investigate if and how the English and French editions of the Wiktionary collaborative dictionary can be used as a corpus for real time neology watch. This option is envisaged as a stopgap, when no satisfactory corpus is available. Wiktionary can also prove useful in addition to standard corpus analysis, to minimize the risk of overlooking new coinages and new senses. Since the collaborative dictionary’s quest for exhaustiveness makes the manual inspection of the new additions unreasonable (more than 31,000 English lemmas and 11,000 French lemmas entered the nomenclature in 2020), identifying the possibly relevant headwords is an issue. The solution proposed here is to use Wiktionary revision history to detect the (new or existing) entries that received the greatest number of modifications. The underlying hypothesis is that the most heavily edited pages can help identify the vocabulary related to “hot topics”, assuming that, in 2020, the pandemic-related vocabulary ranks high. I used two measures introduced by Lih (2004), whose aim was to estimate the quality of Wikipedia articles: the so-called rigour (number of edits per page) and diversity (number of unique contributors per page). In the present study, I propose to adapt the rigour and diversity metrics to Wiktionary in order to identify the pages that generated a particular stir, rather than to estimate the quality of the articles. I do not subscribe to the idea that – in Wiktionary – more revisions necessarily produce quality articles (more revisions often produce complete articles). I therefore adopt Lih’s notion of diversity to refer to the number of distinct contributors, but leave out the name rigour when it comes to the number of revisions. Wolfer and Müller-Spitzer (2016) used the two metrics to describe the dynamics of the German and English editions of Wiktionary. One of their findings was that the number of edits per page is correlated with corpus word frequencies. The variation in number of page edits should therefore reflect to some extent the variation of corpus word frequencies. Renouf (2013) established a relationship between the fluctuation of word frequencies in a diachronic corpus and various neological processes. In particular, she illustrated how specific events generate sudden frequency spikes for words previously unseen in the corpus. For instance, Eyjafjallajökull, the – existing – name of an Icelandic glacier, appeared in the corpus when the underlying volcano erupted in 2010 and disrupted air traffic in Europe. In order to check if the same phenomenon occurs when using Wiktionary edits instead of corpus frequencies, I manually annotated the most frequently revised entries (according to various ranking scores) with the binary tag: “related to Covid-19” (yes/no). The annotations were then used to test the ability of various configurations to detect relevant headwords from the English and French Wiktionary, namely Covid-19 neologisms and related existing words that deserve updates.
To leverage the Deaf community’s increasing online presence, the web-based platform NZSL Share was launched in March 2020 to crowdsource new and previously undocumented signs, and to encourage community validation of these signs. The platform allows users to upload sign videos, comment on videos and agree or disagree with (often new) signs being proposed. It is managed by the research team that maintains the ODNZSL, which includes the authors. NZSL Share is being used by individuals as well as Deaf community groups to record and share signs of a specialist nature (e.g., school curriculum signs). NZSL Share now has close to 50 actively contributing members. Its launch coincided with the 2020 COVID-19 outbreak in New Zealand and so some of the first signs contributed were COVID-19-related, which are the focus of this paper.
This paper arises within the current communication urgency experienced throughout the pandemic. From its onset, several new lexical units have permeated the overall media discourse, as well as social media and other channels. These units convey information to the public regarding the ‘severe acute respiratory syndrome’ namely COVID-19. In addition to its worldwide impact healthwise, the pandemic generates noteworthy influence in the linguistic landscape, and as a result, a significant number of neologisms have emerged. Within the scope of our ongoing research, we identify the neologisms in European Portuguese that are related to the term COVID-19 via form or meaning. However, not all the new lexical units identified in our corpus containing COVID-19 in its formation can unequivocally be regarded as neoterms (terminological neologisms). Accordingly, this article aims not only to reflect on the distinction between neologism and neoterm but also to explore the determinologisation process that several of these new lexical units experience.
This paper examines a certain subset of the vocabulary of Modern Icelandic, namely those words that are labelled as ‘ancient’ in the Dictionary of Contemporary Icelandic (DCI). The words were analysed and grouped into two main categories, 1) Words with only ‘ancient’ sense(s) and 2) words that have modern as well as an obsolete older sense. Several subgroups were identified as well as some lexical characteristics. The words in question were then analysed in two other sources, the Dictionary of Old Norse Prose (ONP) and the Icelandic Gigaword Corpus (IGC). The results show that the words belong to several semantic domains that reflect the types of texts that have survived until modern times. Most of the words are robustly attested in Old Norse sources, although there are a few exceptions. Large majority of the words can be found in Modern Icelandic texts, but to a varying degree. Limits of the corpus material makes it difficult to analyse some of the words. The result indicate that the words labelled ‘ancient’ can be divided into three main groups: a) words that are poorly attested and should perhaps not be included in the lexicographic description of Modern Icelandic; b) words that are likely to occur sometimes in Modern Icelandic; c) words that function as other inherited Old Norse words and perhaps do not require a special label or should have an additional sense in the DCI.
This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.
This paper describes a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. Specifically, we describe the results of extraction from the syntactically tagged Gigafida 2.1 corpus. Using the new method, 4,002,918 collocation candidates in 81 syntactic structures were extracted. We evaluate the extracted data sample in more detail, mainly in relation to properties that affect the extraction of canonical forms: definiteness in adjectival collocations, grammatical number in noun collocations, comparison in adjectival and adverbial collocations, and letter case (uppercase and lowercase) in canonical forms. The conclusion highlights the potential of the methodology used for the grammatical description of collocation and phrasal syntax and the possibilities for improving the model in the process of compilation of a digital dictionary database for Slovene.
This paper looks at whether, after two decades of corpus building for the Bantu languages, the time is ripe to begin using monitor corpora. As a proof-of-concept, the usefulness of a Lusoga monitor corpus for lexicographic purposes, in casu for the detection of neologisms, both in terms of new words and new meanings, is investigated and found useful.
This paper presents the main issues connected with the creation of a trilingual Hungarian-Italian-English dictionary of the COVID-19 pandemic using Lexonomy. My aim is not only to create a coronacorpus (in Hungarian, I propose my own corona-neologism or ‘coroneologism’: koronakorpusz) and a dictionary of equivalents, but also to understand how the different waves and phases of the COVID-19 pandemic are changing the Hungarian language, detect the Corona-, COVID-, pandemic-, virus-, mask-, quarantine-, and vaccine-related neologisms, and offer an overview of the most frequent or linguistically interesting Hungarian neologisms and multiword units related to COVID-19.
This article has a double objective. First, it seeks to offer an initial approach, with critical notes, to the group of pandemic-related neologisms incorporated into the DLE in the year 2020. To that end, the trends in the academic dictionary’s incorporation of neologisms will be reviewed, focusing in particular on specialized language neologisms. Second, the article presents the design of a research study that allows for the examination of any new words beginning with CORONA- added to the DLE and the DHLE. An assessment will be made of the particularities of the DLE and the DHLE regarding the incorporation of the new words, as well as the degree of correspondence or complementarity between the two works in this sense. This will show the complementary roles that the DLE and the DHLE are currently acquiring. In this sense, the new additions open up a debate on the treatment of neologisms in academic lexicography, in a particularly unique scenario.
This paper focuses on standardological and lexicographical aspects of Coronavirus-related neologisms in Croatian. The presented results are based on corpus analysis. The initial corpus for this analysis consists of terms collected for the Glossary of Coronavirus. This corpus has been supplemented by terms we collected on the Internet and from the media. The General Croatian corpora: Croatian Web Corpus – hrWaC (cf. Ljubešić/Klubička 2016) and Croatian Language Repository (cf. Brozović Rončević/Ćavar 2008: 173–186) were also used, but since they do not include neologisms that entered the language after 2013, they could be used only to check terms in the language before that time. From October 2021, a specialized Corona corpus compiled by Štrkalj Despot and Ostroški Anić (2021) became publicly available on request. The data from these corpora are analyzed by Sketch Engine (cf. Kilgarriff et al. 2004: 105–116), a corpus query system loaded with the corpora, enabling the display of lexeme context through concordances and (differential) word sketches and the extraction of keywords (terms) and N-grams. The most common collocations are sorted into syntactic categories. For English equivalents, in addition to the sources found on the Internet, enTenTen2020 corpus was consulted. In the second part of the paper, we analyze and compare the presentation of Coronavirus terminology in the descriptive Glossary of Coronavirus and the normative Croatian Web Dictionary – Mrežnik.
Cette contribution se concentre sur les locuteurs de l’allemand en situation minoritaire dans le Caucase. Il s’agit de descendants d’anciennes minorités allemandes de l’Empire russe et de l’Union soviétique, qui ont émigré vers les territoires transcaucasiens en plusieurs phases à partir de la fin du xviiie siècle. Les personnes interrogées sont celles qui, en raison de mariages interethniques, ont évité les déportations de 1941 et vivent toujours dans le Caucase du Sud. Avec les méthodes caractéristiques de la sociolinguistique, l’auteure a enregistré, transcrit et analysé des entretiens formels semi-dirigés effectués en 2017 dans le Caucase du Sud avec deux générations de descendants. L’article présente la situation des variétés de l’allemand (dialecte souabe et allemand standard) et de leurs locuteurs dans des constellations de langues en contact dans le Caucase ainsi que les actions menées par différents groupes d’acteurs pour préserver la langue et la culture allemandes en Géorgie.
Le bas allemand, répandu dans le tiers nord de l’Allemagne, est une langue régionale dont l’existence est menacée. Elle compte certes encore un grand nombre de locuteurs, mais ceux-ci présentent une structure d’âge très défavorable. Depuis deux générations, la transmission de la langue au sein des familles n’est plus assurée et l’ensemble des locuteurs est fortement vieillissant. Il existe cependant une pratique de théâtre amateur très vivante dans le nord de l’Allemagne : 3 000 troupes de théâtre jouent en effet en bas allemand. Or ces petites unités organisationnelles touchent justement les jeunes avec leurs offres et leur ouvrent l’accès à la langue régionale. Une enquête menée en ligne en 2017 par le Leibniz-Institut für Deutsche Sprache et l’Institut für niederdeutsche Sprache auprès des troupes de théâtre amateur a montré que ces groupes peuvent offrir un cadre stable pour l’utilisation du bas allemand. De nombreux participants à cette enquête ont indiqué que la possibilité d’utiliser le bas allemand constituait pour eux une motivation importante pour participer à leur troupe de théâtre respective.
Within the scope of the project "Study and dissemination of COVID-19 terminology", the study reported here aims to detect, analyse and discuss the characteristics of COVID-19 terminology, in particular the role of the adjective novo [new] in this terminology, the high recurrence of terms in the plural and the resemantization of some of the terminological units used. The present paper also discusses how these characteristics influenced the choices that have guided the creation of the proposed dictionary. This paper presents, therefore, the results of the analyses of these aspects, starting with a discussion of the relation between terminology and neology and arriving at the characteristic aspects of the macrostructural and microstructural choices about which some considerations were made.
Eine korpuslinguistische Untersuchung mit umfassender Analyse der häufiger vorkommenenden Adverbbildungsmuster des Deutschen legt nahe, dass die Sättigung des internen Argumentplatzes eines ursprünglich relationalen Ausdrucks eine wichtige Rolle bei der Adverbproduktion spielt (Brandt 2020). Eine genauere Betrachtung der Unterschiede zwischen -ermaßen- vs. -erweise-Adverbien deutet auf eine grammatische Unterscheidung zwischen Satzadverbien und Adverbien der Art und Weise: Im Fall von -ermaßen erfolgt die Sättigung über Token-Reflexivität, während der interne Slot von -erweise- Bildungen über häufigere und möglicherweise expansive Mechanismen geschlossen wird. Darüber hinaus fördert die pleonastische Qualität von Bildungen auf der Basis gerundivaler Partizipien die Produktivität von -erweise Adverbien.
While adjusting to the COVID-19 pandemic, people around the world started to talk about the “new normal” way of life, and they conveyed feelings and thoughts on the topic through social networks and traditional communication channels resorting to a set of specific linguistic strategies, such as metaphors and neologisms. The vocabulary in different domains and in everyday speech was expanded to accommodate a complex social, cultural, and professional phenomenon of changes. Therefore, this new life gave birth to a new language – the “coronaspeak”. According to Thorne (2020), the “coronaspeak” has three stages: first, it emerged in the way medical aspects were communicated in everyday language; secondly, it occurred when speakers verbalized the experiences they had undergone and “invented their own terms”; finally, this “new” way of speaking emerged in the government and authorities’ jargon, to ensure that the new rules and policies were understood, and that population adopted socially responsible behaviours.
In this paper, we will focus on the second stage, because we intend to take stock of how speakers communicate and verbalize this new way of living, particularly on social networks, for example. Alongside, we are interested in the context in which the neologism – be it a new word, a new meaning, or a new use – emerged, is used, and understood, through the observation of the occurrence of the new word(s) either on social networks or through dissemination texts (press) to confront it with the ones that Portuguese digital dictionaries have attested so far. Different criteria regarding the insertion of new units, the inclusion date, and the lexicographic description of the entries in the dictionaries will be debated.
Phonesthemes (Firth 1930) are sublexical constructions that have an effect on the lexico-grammatical continuum: they are recurring form-meaning associations that occur more often than by chance but not systematically (Abramova/Fernandez/Sangati 2013). Phonesthemes have been shown (Bergen 2004) to affect psycholinguistic language processing; they organise the mental lexicon. Phonesthemes appear over time to emerge as driven by language use as indexical rather than purely iconic constructions in the lexicon (Smith 2016; Bergen 2004; Flaksman 2020). Phonesthemes are acknowledged in construction morphology (Audring/Booij/Jackendoff 2017) as motivational schemas. Some phonesthemes also tend to have lexicographic acknowledgment, as shown by etymologist Liberman (2010), although this relevance and cohesion appears to be highly variable as we will show in this paper.