Refine
Year of publication
Document Type
- Part of a Book (28)
- Article (17)
- Conference Proceeding (16)
- Book (1)
Keywords
- Lehnwort (20)
- Computerunterstützte Lexikographie (15)
- Deutsch (14)
- Sprachstatistik (7)
- Wörterbuch (7)
- Linguist (6)
- Biografie (4)
- Korpus <Linguistik> (4)
- Online-Wörterbuch (4)
- Russisch (4)
Publicationstate
- Veröffentlichungsversion (35)
- Postprint (5)
- Zweitveröffentlichung (2)
Reviewstate
- Peer-Review (15)
- (Verlags)-Lektorat (12)
- Verlags-Lektorat (12)
- Peer-review (1)
Publisher
- Niemeyer (6)
- De Gruyter (4)
- IDS-Verlag (3)
- Lexical Computing CZ s.r.o. (3)
- Sagner (3)
- Institut für Deutsche Sprache (2)
- Trojina, Institute for Applied Slovene Studies (2)
- Zenodo (2)
- Accademia della Crusca (1)
- Democritus University of Thrace (1)
Julius Pokorny
(1996)
Hermann Osthoff
(1996)
Eduard Rudolf Thurneysen
(1996)
This paper deals with the distribution of word length in short native mythological and historical Eskimo narrative texts. To my knowledge, no Eskimo‐Aleut data have been the object of quantitative linguistic investigation so far. Due to the strong linguistic and Stylistic homogeneity of the examined texts it was assumed that these texts can be subsumed under a single law of word length distribution, if word length distribution of a text is considered as a function of certain of its properties, such as author, language, and genre. So far, word length distribution in texts of a wide variety of languages and genres has been demonstrated to follow distributions of the compound Poisson family of discrete probability distributions. In view of the morphological idiosyncrasies of the Eskimo language in general, which are responsible for an unusually high mean word length of about 4.5 to 5.2 syllables per word in the texts, it is interesting to see whether Eskimo texts show a significantly different behaviour with respect to word length. The results demonstrate that the Eskimo data employed in this study can be fitted well by the Hyperpoisson distribution. Two further discrete probability distributions will be deduced from certain morphology‐based assumptions about Eskimo. It turns out that most of the Eskimo data can be fitted by these two distributions. The question to what extent these results point to a more grammar‐oriented theory of word length is also discussed.
Referenz und ihre Gegenstände. Bemerkungen zur Pragmatik eines sprachphilosophischen Begriffs
(1998)
According to a widespread conception, quantitative linguistics will eventually be able to explain empirical quantitative findings (such as Zipf’s Law) by deriving them from highly general stochastic linguistic ‘laws’ that are assumed to be part of a general theory of human language (cf. Best (1999) for a summary of possible theoretical positions). Due to their formal proximity to methods used in the so-called exact sciences, theoretical explanations of this kind are assumed to be superior to the supposedly descriptive-only approaches of linguistic structuralism and its successors. In this paper I shall try to argue that on close inspection such claims turn out to be highly problematic, both on linguistic and on science-theoretical grounds.
Physicists look at language
(2006)
Sprachkritik, dahinsickernd
(2007)
Three popular collections of essays concerning correct language use in German are reviewed from a linguist’s point of view. It is claimed that the overall picture of language that Sick conveys to the layperson is inadequate; in addition, the author fails to reflect explicitly on the purpose and consequences of his prescriptive approach to language use.
The present study examines the dynamics of the kanji combinations that form common (or general) and proper nouns in Japanese. The following three results were obtained. First, the degree of distribution results from two similar processes which are based on a steady-state of birth-and-death processes with different birth and death rates, resulting in a positive negative binomial distribution with the proper nouns and in a positive Waring distribution with common nouns. Second, all rank-frequency distributions follow the negative hypergeometric distribution used very frequently in ranking problems. Third, the building of kanji compounds follows a dissortative strategy. The higher the outdegree of a kanji, the more it prefers kanji with lower indegrees. A linear dependence can be observed with common nouns, whereas the relationship between compounded kanji is rather curvilinear with proper nouns. The actual analytical expression is not yet known.
Open peer commentary on the target article “Who Conceives of Society?” by Ernst von Glasersfeld. Excerpt: I will focus on one crucial step in von Glasersfeld’s argumentation, viz. his view that every individual constructs his own private meanings (understood as conceptual structures or elements thereof) for linguistic expressions, so that linguistic interaction and even communication in general is based on a notion of compatibility between different speakers’ private conceptual schemes. The central question here is: “Just what does it mean that different private conceptual schemes (private meanings) are compatible, or what constitutes a viable criterion to this end?” As von Glasersfeld himself stresses twice (§28, §37), the criteria to be looked for can only be “public,” residing in properties of verbal and non-verbal actions of the interacting individuals, properties that can be sensed and processed by the participating system.
Julius Pokorny
(2009)
Hermann Osthoff
(2009)
Eduard Rudolf Thurneysen
(2009)
The representation of semantic relations between word senses of different entries in a dictionary is subject to a number of consistency requirements. This paper discusses the issue of maintaining and accessing consistent information on cross-references between sense-related items in electronic dictionaries from a mainly text-technological point of view. We present a number of consistency criteria for cross-referencing related senses and propose a practical approach to handling sense relations in an online dictionary. Our proposal is currently being tested in a large ongoing online dictionary project for German called elexiko. We focus on three different aspects of the dictionary development and editing process where consistency is an important issue: lexicographic data modelling, implementation of a lexicographic database system for an electronic dictionary, and development of practical tools for the lexicographer’s workbench.
Der vorliegende Beitrag stellt einen neuartigen Typ von mehrsprachiger elektronischer Ressource vor, bei dem verschiedene Lehnwörterbücher zu einem "umgekehrten Lehnwörterbuch" für eine bestimmte Gebersprache zusammengefasst werden. Ein solches Wörterbuch erlaubt es, die zu einem Etymon der Gebersprache gehörigen Lehnwörter in verschiedenen Nehmersprachen zu finden. Die Entwicklung einer solchen Webanwendung, insbesondere der zugrundeliegenden Datenbasis, ist mit zahlreichen konzeptionellen Problemen verbunden, die an der Schnittstelle zwischen lexikographischen und informatischen Themen liegen. Der Beitrag stellt diese Probleme vor dem Hintergrund wünschenswerter Funktionalitäten eines entsprechenden Internetportals dar und diskutiert einen möglichen Lösungsansatz: Die Artikel der Einzelwörterbücher werden als XML-Dokumente vorgehalten und dienen als Grundlage für die gewöhnliche Online-Ansicht dieser Wörterbücher; insbesondere für portalweite Abfragen werden aber grundlegende, standardisierte Informationen zu Lemmata und Etyma aller Portalwörterbücher samt deren Varianten und Wortbildungsprodukten (hier zusammenfassend als "Portalinstanzen" bezeichnet) sowie die verschiedenartigen Relationen zwischen diesen Portalinstanzen zusätzlich in relationalen Datenbanktabelle nabgelegt, die performante und beliebig komplex strukturierte Suchabfragen gestatten.
vernetziko is an assistive software tool primarily designed for managing cross-references in XML-based electronic dictionaries. In its current form it has been developed as an integral part of the lexicographic editing environment for the German monolingual dictionary elexiko developed and compiled at the Institut für Deutsche Sprache, Mannheim. This paper first briefly outlines how vernetziko fits into the XML-based dictionary editing technology of elexiko. Then vernetziko’s core functionality and some of the auxiliary tools integrated into the program are presented from both a practical and a technological point of view. The concluding sections discuss some software engineering aspects of extending the tool to handle cross-references between multiple resources and point out some of the advantages of vernetziko vis-à-vis corresponding features of proprietary dictionary writing systems. The software can be adapted to interconnect off-the-shelf components (database management systems and editors), thus providing a tailor-made lexicographical workbench for a wide range of XML-based dictionaries without vendor lock-in.
In this paper, we address issues of inconsistencies of dictionary information and how different corpus methods and computer tools can assist in providing systematic cross-referencing. The question is raised how hyperlinking in an electronic reference work can be approached systematically in order to warrant consistent symmetrical links between synonyms or antonyms. Firstly, it is argued that working with a comprehensive corpus does not account for consistent cross-referencing. It is shown that a top-down corpus-driven linguistic analysis also does not guarantee the lexicographic documentation of binary lexico-semantic relations covered by corpus data, as proposed by Paradis/Willners (2006a, b). Secondly, with the help of dictionary examples taken from elexiko (an online dictionary of contemporary German) we demonstrate how a combination of both corpus-driven and corpus-based procedures enables lexicographers to systematically exploit corpus material in more depth than by using only one of these methods. It is also discussed where and why lexicographers are still prone to inconsistencies in the editing processes, irrespective of their underlying corpus methodologies. Finally, we introduce a cross-reference management tool that has been developed for elexiko and we explain its technological prerequisites and implications. This software supports lexicographers in detecting existing and missing references from and to a specific headword. It also offers options to automatically and comfortably correct discrepancies. Overall, we suggest a method that includes linguistic competence, complementary corpus approaches and additional software in order to ensure that links or references between synonymic and antonymic pairings are given in both directions.
Vor allem in älteren Wörterbüchern mit philologischer Ausrichtung ist die Mikrostruktur der Artikel häufig diskursiv und unsystematisch. Eine automatisierte Digitalisierung solcher Wörterbücher mit dem Ziel, ihre logische Struktur zu kodieren, ist nicht möglich; in vielen Fällen ist schon ein Parser für ein manuell nachzubearbeitendes Rohdigitalisat kein realistisches Ziel, weil die Angabetypen des Wörterbuchs nicht klar voneinander abgrenzbar und in den Einzelartikeln nicht eindeutig identifizierbar sind. In solchen Fällen wirft auch eine nachträgliche manuelle Formalisierung der Mikrostruktur große lexikografische Probleme auf. Für komplexere Anwendungsszenarien wie etwa Abfragen in Webanwendungen kann es dennoch unumgänglich sein, wenigstens sämtliche relevanten in den Artikeln diskutierten Wortformen mit grundsätzlichen diasystematischen und morphologischen Informationen sowie ihren Relationen zueinander in einem maschinell lesbaren Format strukturiert zu repräsentieren, etwa durch datenzentrierte XML-Dokumente. Der Vortrag versucht, die lexikografischen und technischen Möglichkeiten und Grenzen einer solchen teilweisen und manuellen Retrodigitalisierung am Beispiel von Erfahrungen mit einem älteren Wörterbuch deutscher Lehnwörter im Slovenischen (Striedter-Temps 1963) auszuloten. Das Wörterbuch soll in ein Portal von Lehnwörterbüchern mit Deutsch als gemeinsamer Gebersprache eingebunden werden. Die Einzelartikel werden dem Benutzer als Bilddigitalisate zur Verfügung gestellt; die zusätzliche textuelle Retrodigitalisierung ist jedoch für komplexere, insbesondere auch für wörterbuchübergreifende und portalweite, Suchabfragen erforderlich.
The web portal Lehnwortportal Deutsch (lwp.ids-mannheim.de), developed at the Institute for the German Language (IDS), aims to provide unified access to existing and possibly new dictionaries of German loanwords in other languages. Internally, the lexicographical information is represented as a directed acyclic graph of relations between words. The graph abstracts from the idiosyncrasies of the individual component dictionaries. This paper explores two different strategies to make complex graph-based cross-dictionary queries in such a portal more accessible to users. The first strategy effectively hides the underlying graph structure, but allows users to assign scopes (internally defined in terms of the graph structure) to search criteria. A second type of search strategy directly formulates queries in terms of the relational graph structure. In this case, search results are not entries but n-tuples of words (metalemmata, loanwords, etyma); a query consists of specifying properties of these words and relations between them. A working prototype of an easy-to-use human-readable declarative query language is presented and ways to interactively construct queries are discussed.
In dem Beitrag präsentieren und diskutieren die Autoren zunächst einige Untersuchungen aus der Benutzungsforschung zu elektronischen Wörterbüchern, die sich mit der nutzerseitigen Beurteilung des Mehrwerts multimedialer und benutzeradaptiver Elemente befassen (Kap. 1. In einem zweiten Teil versuchen sie, ausgehend von den Stärken und Schwächen vorhandener Ansätze in diesem Bereich, Antworten auf die Frage zu finden, welche Anforderungen an Visualisierungstechniken und ‑strategien in elektronischen Wörterbüchern gestellt werden müssen, um einen solchen Mehrwert zu erhalten (Kap. 2). Abschließend stellen sie als praktisches Beispiel für eine mögliche Umsetzung solcher Anforderungen den Prototyp einer Software zur interaktiven Erkundung von Wortbildungsangaben im Wörterbuch vor.
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
This paper reports on an ongoing lexicographical project that investigates Polish loanwords from German that were further borrowed into the East Slavic languages Russian, Ukrainian, and Belorussian. The results will be published as three separate dictionaries in the Lehnwortportal Deutsch, a freely available web portal for loanword dictionaries having German as their common source language. On the database level, the portal models lexicographical data as a cross-resource directed acyclic graph of relations between individual words, including German ‘metalemmata’ as normalized representations of diasystemic variants of German etyma. Amongst other things, this technology makes it possible to use the web portal as an ‘inverted loanword dictionary’ to find loanwords in different languages borrowed from the same German etymon. The different possible pathways of German loanwords that went through Polish into the East Slavic languages can be represented directly as paths in the graph. A dedicated in-house dictionary editing software system assists lexicographers in producing and keeping track of these paths even in complex cases where, e.g, only a derivative of a German loanword in Polish has been borrowed into Russian. The paper concludes with some remarks on the particularities of the dictionary/portal access structure needed for presenting and searching borrowing chains.
This contribution outlines a conceptual analysis of the dictionary-internal cross-reference structure in electronic dictionaries along the lines of Wiegand’s actional-theoretical text theory of print dictionaries. The discussion focuses on issues of XML-based data modeling, using the monolingual German online dictionary elexiko as a running example. The first part of the article demonstrates how Wiegand’s formal theory of mediostructure and its intricate nomenclature can be extended in a systematic and lexicographically justified way to cover the structure of the underlying lexicographical database of online dictionaries. The second part of the article applies the concepts developed to a more technical question, examining the extent to which cross-reference information can be stored and processed separately from the dictionary entry documents, e.g., in a relational database. The results are largely negative; in most real world cases, this leads to an unwanted duplication of XML-related structural information. The concluding third part briefly describes the strategy chosen for elexiko: mediostructural information is not externalized at all; cross-reference consistency checks are performed by a dictionary editing tool that takes advantage of a specialized XML database index and can easily be made more efficient and scalable by using a simple caching technique.
In this paper, the authors use the 2012 log files of two German online dictionaries (Digital Dictionary of the German Language and the German Version of Wiktionary) and the 100,000 most frequent words in the Mannheim German Reference Corpus from 2009 to answer the question of whether dictionary users really do look up frequent words, first asked by de Schryver et al. (2006). By using an approach to the comparison of log files and corpus data which is completely different from that of the aforementioned authors, we provide empirical evidence that indicates - contrary to the results of de Schryver et al. and Verlinde/Binon (2010) - that the corpus frequency of a word can indeed be an important factor in determining what online dictionary users look up. Finally, we incorporate word class Information readily available in Wiktionary into our analysis to improve our results considerably.
The web portal Lehnwortportal Deutsch <lwp.ids-mannheim.de>, developed at the Institute for the German Language (IDS), aims to provide unified access to a growing number of lexicographical resources on German loanwords in other languages. This paper discusses different possibilities of creating an onomasiological access structure for portal users. We critically examine the meaning list of the “World Loanword Database” project (Haspelmath/Tadmor 2009a) as well as WordNet-based taxonomies and propose a new way of inductively creating a semantic classification scheme that takes both hyperonymic relations and semantic fields into account. We show how such a classification can be integrated into the underlying graph-based data representation of the Lehnwortportal and thus be exploited for advanced onomasiological search options.
This paper presents a dictionary writing system developed at the Institute for the German Language in Mannheim (IDS) for an ongoing international lexicographical project that traces the way of German loanwords in the East Slavic languages Russian, Belarusian and Ukrainian that were possibly borrowed via Polish. The results will be published in the Lehnwortportal Deutsch (LWP, lwp.ids-mannheim.de), a web portal for loanword dictionaries with German as the common donor language. The system described here is currently in use for excerpting data from a large range of historical and contemporary East Slavic monolingual dictionaries. The paper focuses on the tools that help in merging excerpts that are etymologically related to one and the same Polish etymon. The merging process involves eliminating redundancies and inconsistencies and, above all, mapping word senses of excerpted entries onto a common cross-language set of ‘metasenses’. This mapping may involve literally hundreds of excerpted East Slavic word senses, including quotations, for one ‘underlying’ Polish etymon.
Datenmodellierung
(2016)
Das Kapitel widmet sich den grundsätzlichen technischen Rahmenbedingungen für die heutige Internetlexikografie. Zum einen skizzieren die Autoren, was „hinter“ den auf einem Monitor sichtbaren Benutzeroberflächen geschieht, wenn eine Nutzerin online auf ein Wörterbuch zugreift, und wie diese Prozesse zu Dokumentationszwecken in I.ogdateien protokolliert werden können. Zum anderen diskutieren sie, wie die Identität und dauerhafte Verfügbarkeit von Inhalten angesichts der ständig möglichen Aktualisierbarkeit von Online-Angeboten sichergestellt werden können.
The wdlpOst dictionary writing system to be presented in this paper has been developed for the specific purposes of a lexicographical project on German loanwords in the East Slavic languages Russian, Belarusian, and Ukrainian. The project’s main objectives are (i) to document those loanwords for which a cognate lexical borrowing from German is known in Polish and (ii) to establish possible borrowing pathways for these lexical items. In the first phase of the project, the collaborative client/server architecture of the wdlpOst system has been used for excerpting detailed lexicographical information from a large range of historical and contemporary East Slavic dictionaries, taking the entries in a large dictionary of German loanwords in Polish as a common frame of reference. For the project’s second phase, the wdlpOst system provides innovative tooling for compiling entries of the East Slavic loanwords. Most importantly, the numerous word sense definitions for a set of cognate loanwords, as excerpted from different lexicographical sources, are mapped onto a system of newly defined cross-language word senses; in a similar vein, the phonemic and graphemic variation in the loanwords and their derivatives is captured through a tool that abstracts from dictionary-specific idiosyncrasies.
Languages employ different strategies to transmit structural and grammatical information. While, for example, grammatical dependency relationships in sentences are mainly conveyed by the ordering of the words for languages like Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for languages such as Inupiatun or Quechua, as these languages (also) use the internal structure of words (e.g. inflectional morphology) to mark grammatical relationships in a sentence. Based on a quantitative analysis of more than 1,500 unique translations of different books of the Bible in almost 1,200 different languages that are spoken as a native language by approximately 6 billion people (more than 80% of the world population), we present large-scale evidence for a statistical trade-off between the amount of information conveyed by the ordering of words and the amount of information conveyed by internal word structure: languages that rely more strongly on word order information tend to rely less on word structure information and vice versa. Or put differently, if less information is carried within the word, more information has to be spread among words in order to communicate successfully. In addition, we find that–despite differences in the way information is expressed–there is also evidence for a trade-off between different books of the biblical canon that recurs with little variation across languages: the more informative the word order of the book, the less informative its word structure and vice versa. We argue that this might suggest that, on the one hand, languages encode information in very different (but efficient) ways. On the other hand, content-related and stylistic features are statistically encoded in very similar ways.
Am 12. Mai 1965 nahmen der Staat Israel und die Bundesrepublik Deutschland offiziell diplomatische Beziehungen auf. Damit kam über 15 Jahre nach der Konstitution der beiden Länder und 20 Jahre nach dem Ende der Shoah ein komplexer Prozess der langsamen politischen Annäherung zu einem keineswegs selbstverständlichen Abschluss. Das fünfzigjährige Jubiläum dieses Ereignisses im Jahr 2015 war weltweit, vor allem aber in Israel und Deutschland, Anlass für zahlreiche Veranstaltungen, über die eine offizielle bilaterale Webseite <www.de50il.org/> (Stand: 6.11.2017) Auskunft gibt. Im Rahmen des Jubiläums wurde am 30. September 2015 in einer feierlichen Abendveranstaltung im Jüdischen Museum Berlin offiziell das „Wörterbuch deutscher Lehnwörter im Hebräischen“ von Uriel Adiv in einer ersten Fassung im „Lehnwortportal Deutsch“ des IDS freigeschaltet. Eine von Koautor Jakob Mendel erheblich überarbeitete und verbesserte zweite Version ging im Mai 2017 online. Der vorliegende Beitrag möchte einige Hintergründe zum deutschen Lehnwortschatz im modernen Hebräischen darstellen sowie die Entstehungsgeschichte des Werks und seinen Platz in der lehnwortlexikografischen Publikationsplattform „Lehnwortportal Deutsch“ <http://lwp.ids-mannheim.de/> (Stand: 6.11.2017) beleuchten.
Making 1:n explorable: a search interface for the ZAS database of clause-embedding predicates
(2017)
We introduce a recently published corpus-based database of German clause-embedding predicates and present an innovative web application for exploring it. The application displays the predicates and the corpus examples for these predicates in two separate tables that can be browsed and searched in real time. While familiar web interface paradigms make it easy for users to get started, the data presentation and the interactive advanced search components for the two tables are designed to accommodate remarkably complex query needs without the need for resorting to a dedicated query language or a more specialized tool. The 1:n relationship between predicates and their examples is exploited in the two tables in that, e.g. the predicate table also shows, for each predicate and each example attribute, all values that occur in the examples for this predicate. An easy-to-use visual query builder for arbitrary Boolean combinations of search criteria can optionally be displayed to pre-filter the underlying data presented in both tables. Several options for altering quantifier scope can be activated with simple checkboxes and considerably widen the space of searchable constellations.