Refine
Year of publication
- 2014 (160) (remove)
Document Type
- Part of a Book (89)
- Article (26)
- Conference Proceeding (19)
- Book (18)
- Working Paper (5)
- Preprint (2)
- Part of Periodical (1)
Is part of the Bibliography
- yes (160) (remove)
Keywords
- Deutsch (56)
- Korpus <Linguistik> (25)
- Computerunterstützte Lexikographie (20)
- Institut für Deutsche Sprache <Mannheim> (18)
- Wörterbuch (9)
- Benutzer (8)
- Konversationsanalyse (7)
- Gesprochene Sprache (6)
- Rumänisch (6)
- Sprachvariante (6)
Publicationstate
- Veröffentlichungsversion (55)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (38)
- Peer-Review (18)
- Peer-review (6)
- Verlags-Lektorat (4)
- (Verlags)Lektorat (1)
- (Verlags-)Lektorat (1)
- Peer-Revied (1)
Publisher
- Institut für Deutsche Sprache (37)
- De Gruyter (33)
- European Language Resources Association (ELRA) (9)
- Stauffenburg (6)
- Winter (6)
- Cambridge Scholars Publ. (4)
- Benjamins (3)
- Erich Schmidt Verlag (3)
- Lang (3)
- de Gruyter (3)
"Badeölgrüne Buchten", "kükengelbes Haar" und "tomatenrote Tomaten" - Vergleiche mit Farbadjektiven
(2014)
We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.
In this paper, we present the concept and the results of two studies addressing (potential) users of monolingual German online dictionaries, such as www.elexiko.de. Drawing on the example of elexiko, the aim of those studies was to collect empirical data on possible extensions of the content of monolingual online dictionaries, e.g. the search function, to evaluate how users comprehend the terminology of the user interface, to find out which types of information are expected to be included in each specific lexicographic module and to investigate general questions regarding the function and reception of examples illustrating the use of a word. The design and distribution of the surveys is comparable to the studies described in the chapters 5-8 of this volume. We also explain, how the data obtained in our studies were used for further improvement of the elexiko-dictionary.
This contribution presents the procedure used in the Handbuch deutscher Kommunikationsverben and in its online version Kommunikationsverben in the lexicographical internet portal OWID to divide sets of semantically similar communication verbs into ever smaller sets of ever closer synonyms. Kommunikationsverben describes the meaning of communication verbs on two levels: a lexical level, represented in the dictionary entries and by sets of lexical features, and a conceptual level, represented by different types of situations referred to by specific types of verbs. The procedure starts at the conceptual level of meaning where verbs used to refer to the same specific situation type are grouped together. At the lexical level of meaning, the sets of verbs obtained from the first step are successively divided into smaller sets on the basis of the criteria of (i) identity of lexical meaning, (ii) identity of lexical features, and (iii) identity of contexts of usage. The stepwise procedure applied is shown to result in the creation of a semantic network for communication verbs.
Der Blick zurück nach vorn
(2014)
In diesem Beitrag wird an einigen Beispielen aus der nominalen Morphologie bzw. der Morphosyntax der deutschen Substantivgruppe gezeigt, wie sich in den Veränderungen in diesem Bereich, die sich über das 20. Jahrhundert hin beobachten lassen, Fragen eines langfristigen Systemwandels mit Regularitäten des Sprachgebrauchs überlagern. Im Mittelpunkt soll die Frage der Markierung der Kasus – insbesondere in den allgemein als „kritisch“ angesehenen Fällen von Genitiv und Dativ – stehen. Wenn man die Daten dazu betrachtet, sieht man, dass in den meisten Fällen schon zum Anfang des 20. Jahrhunderts eine weitgehende Anpassung an die Regularitäten der Monoflexion erfolgt war, auch, dass dieser Prozess über das Jahrhundert hin fortschreitet. Bemerkenswert ist, dass insgesamt die als „alt“ angesehenen Fälle in den untersuchten Korpora geschriebener Sprache (sehr) selten auftauchen, dass aber in zunehmendem Ausmaß die daraus folgende Markiertheit in der einen oder anderen Weise funktional genutzt wird. Einen Fall eigener Art stellt in diesem Zusammenhang der Genitiv dar, der sich bei den starken Maskulina und Neutra bekanntlich dem Trend zur „Einmalmarkierung“ der Kasus an den flektierten, das Substantiv begleitenden Elementen widersetzt. Das führt zu der bekannten Orientierung dieser Formen auf die Nicht-Objekt-Verwendungen und auch zu einem auffälligen Maß an Variation in der Nutzung der entsprechenden Flexionsformen.
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
In this paper, the authors use the 2012 log files of two German online dictionaries (Digital Dictionary of the German Language and the German Version of Wiktionary) and the 100,000 most frequent words in the Mannheim German Reference Corpus from 2009 to answer the question of whether dictionary users really do look up frequent words, first asked by de Schryver et al. (2006). By using an approach to the comparison of log files and corpus data which is completely different from that of the aforementioned authors, we provide empirical evidence that indicates - contrary to the results of de Schryver et al. and Verlinde/Binon (2010) - that the corpus frequency of a word can indeed be an important factor in determining what online dictionary users look up. Finally, we incorporate word class Information readily available in Wiktionary into our analysis to improve our results considerably.
Die Basislemmaliste (BLL) der neuhochdeutschen (nhd.) Standardsprache ist eine korpusbasierte, frequenzsortierte Lemmaliste mit mehr als 325.000 Einträgen. Jedes Lemma wird ergänzt durch Wortarten- und Häufigkeitsangaben. Die im Folgenden vorgestellte Version 1.0 der BLL wurde aus DeReKo, dem Deutschen Referenzkorpus des Instituts für Deutsche Sprache, mit 5 Milliarden Wortformen erstellt. Weitere Sprachressourcen sind linguistische Korpusannotationen, die von linguistischen Annotationswerkzeugen wie Lemmatisierern, Part-of-Speech-Taggern oder Parsern stammen. Für die Erstellung der BLL ist das Lemma und das Part-of-Speech-Tag relevant. Die Distanz zwischen lexikografischen Konventionen und maschineller Realität in Form von automatisch vergebenen Lemma-Annotationen erfordert einen Abgleich der aus den Korpusannotationen automatisch generierten Lemmalisten mit der digital verfügbaren Lemmastrecke eines Wörterbuches. Zum einen, um die Vollständigkeit der Einträge frequenter Wörter und das Vorkommen seltener Simplizia in der BLL zu gewährleisten, zum anderen, um die Lemmaform und die Lemmagranularität an die Erwartungen anzupassen, die ein menschlicher Benutzer an ein lexikalisches Verzeichnis der neuhochdeutschen Standardsprache stellt.
Die Leibniz-Gemeinschaft
(2014)
Discourses of Helping Professions brings together cutting-edge research on professional discourses from both traditional helping contexts such as doctor-patient interaction or psychotherapy and more recent helping contexts such as executive coaching. Unlike workplace, professional and institutional discourse – by now well established fields in linguistic research – discourses of helping professions represent an innovative concept in its orientation to a common communicative goal: solving patients’ and clients’ physical, psychological, emotional, professional or managerial problems via a particular helping discourse. The book sets out to uncover differences, similarities and interferences in how professionals and those seeking help interactively tackle this communicative goal. In its focus on professional helping contexts and its inter-professional perspective, the current book is a primer, intended to spark off more interdisciplinary and (applied) research on helping discourses, a socio-cultural phenomenon that is of growing importance in our post-modern society. As such, it is of great relevance for discourse researchers and discourse practitioners, caretakers and social scientists of all shades as well as for everybody interested in helping professions.
This contribution offers a fine-grained analysis of German and Romanian ditransitive and prepositional transfer constructions. The transfer construction (TC) is shown to be realised in German by 26 argument structure patterns (ASPs), which are conceived of as form-meaning pairings which differ only minimally. The mainstream constructionist view of the different types of TCs being related by polysemy links is rejected, the ASPs being argued instead to be related by family relationships. All but six of the ASPs identified for German are shown to possess a Romanian counterpart. For some ditransitive structures, German is shown to possess two prepositional variants, one with an (‘at’) and one with zu (‘to’) or auf (‘on’), while Romanian has only one. Due to the lack of a Romanian counterpart for the German zu and auf variants, Romanian lacks some of the dative alternations found in German. However, Romanian as well as German permits the double object pattern to interact with take-verbs, verbs of removal and add-verbs, which do not allow the ditransitive construction in English. Since these verb classes also permit at least one prepositional pattern in both languages, Romanian and German show a larger number of dative alternation types than English.
German lexical items with similar or related morphological roots and similar meaning potential are easily confused by native speakers and language learners. These include so-called paronyms such as effektiv/effizient , sensitive/sensibel, formell/formal/förmlich . Although these are generally not regarded as synonyms, empirical studies suggest that in some cases items of a paronym set have undergone meaning change and developed synonymous notions. In other cases, they remain similar in meaning, but show subtle differences in definition and restrictions of usage. Whereas the treatment of synonyms has received attention from corpus-linguists (cf. Partington 1998; Taylor 2003), the subject of paronyms has not been revisited with empirical, data-driven methods neither in terms of semantic theory nor in terms of practical lexicography. As a consequence, we also need to search for suitable corpus methods for detailed semantic investigation. Lexicographically, some German paronyms have been documented in printed dictionaries (e.g. Müller 1973; Pollmann & Wolk 2010). However, there is no corpus-assisted reference guide describing paronyms empirically and enabling readers to find the correct contemporary usage. Therefore, solutions to some lexicographic challenges are required.
To design effective electronic dictionaries, reliable empirical information on how dictionaries are actually being used is of great value for lexicographers. To my knowledge, no existing empirical research addresses the context of dictionary use, or, in other words, the extra-lexicographic situations in which a dictionary consultation is embedded. This is mainly due to the fact that data about these contexts are difficult to obtain. To take a first step in closing this research gap, we incorporated an open-ended question (“In which contexts or situations would you use a dictionary?”) into our first online survey (N = 684). Instead of presenting well-known facts about standardized types of usage situation, this chapter will focus on the more offbeat circumstances of dictionary use and aims of users, as they are reflected in the responses. Overall, my results indicate that there is a community whose work is closely linked with dictionaries. Dictionaries are also seen as a linguistic treasure trove for games or crossword puzzles, and as a standard which can be referred to as an authority. While it is important to emphasize that my results are only preliminary, they do indicate the potential of empirical research in this area.
This chapter summarizes the typical steps of an empirical investigation. Every step is illustrated using examples from our research project into online dictionary use or other relevant studies. This chapter does not claim to contain anything new, but presents a brief guideline for lexicographical researchers who are interested in conducting their own empirical research.
The main aim of the study presented in this chapter was to try out eyetracking as form to collect data about dictionary use as it is – for research into dictionary use – a new and not widely used technology. As the topic of research, we decided to evaluate the new web design of the IDS dictionary portal OWID. In the mid of 2011 where the study was conducted, the relaunch of the web design was internally finished but externally not released yet. In this regard, it was a good time to see whether users get along well with the new design decisions. 38 persons participated in our study, all of them students aged 20-30 years. Besides the results the chapter also includes critical comments on methodological aspects of our study.
EXMARaLDA
(2014)
Language resources are often compiled for the purpose of variational analysis, such as studying differences between genres, registers, and disciplines, regional and diachronic variation, influence of gender, cultural context, etc. Often the sheer number of potentially interesting contrastive pairs can get overwhelming due to the combinatorial explosion of possible combinations. In this paper, we present an approach that combines well understood techniques for visualization heatmaps and word clouds with intuitive paradigms for exploration drill down and side by side comparison to facilitate the analysis of language variation in such highly combinatorial situations. Heatmaps assist in analyzing the overall pattern of variation in a corpus, and word clouds allow for inspecting variation at the level of words.
In the present-day Germanic languages, free relatives (FRs) share formal properties with indirect question in that both constructions are introduced by w-pronouns. However, at least in German (and historical stages of a larger set of languages, including English), there is an additional pattern which involves the use of d-pronouns such as German der/die/das ‘that.masc./fem./neut.’, which typically introduce headed relative clauses. Focusing on presentday German, this paper shows that d-FRs are set apart from w-FRs by a number of properties including syntactic distribution in the matrix clause, behavior with respect to matching effects, inventory of pronominal forms, and semantic interpretation. From these observations, it is concluded that d-FRs should not be analyzed on a par with w-FRs. More precisely, we argue that d-FRs are in fact regular headed (restrictive) relative clauses where the relative pronoun has been deleted under identity with a demonstrative antecedent. This apparent instance of syntactic haplology is then analyzed as resulting from the same mechanism that eliminates copies/traces in movement dependencies.
In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.
Gegenwart und Zukunft der Abteilung Lexik am IDS: Plädoyer für eine Lexikographie der Sprachdynamik
(2014)
The first international study (N=684) we conducted within our research project on online dictionary use included very general questions on that topic. In this chapter, we present the corresponding results on questions like the use of both printed and online dictionaries as well as on the types of dictionaries used, devices used to access online dictionaries and some information regarding the willingness to pay for premium content. The data collected by us, show that our respondents both use printed and online dictionaries and, according to their self-report, many different kinds of dictionaries. In this context, our results revealed some clear cultural differences: in German-speaking areas spelling dictionaries are more common than in other linguistic areas, where thesauruses are widespread. Only a minority of our respondents is willing to pay for premium content, but most of the respondents are prepared to accept advertising. Our results also demonstrate that our respondents mainly tend to use dictionaries on big-screen devices, e.g. desktop computers or laptops.
We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is a comprehensive database of German genitive formations, enriched with a broad range of intra- und extralinguistic metadata. It can be used for the notoriously controversial classification and prediction of genitive endings (short endings, long endings, zero-marker). We also evaluate the main factors influencing the use of specific endings. To get a general idea about a factor’s influences and its side effects, we calculate chi-square-tests and visualize the residuals with an association plot. The results are evaluated against a gold standard by implementing tree-based machine learning algorithms. For the statistical analysis, we applied the supervised LMT Logistic Model Trees algorithm, using the WEKA software. We intend to use this gold standard to evaluate GenitivDB, as well as to explore methodologies for a predictive genitive model.
This paper reports on an ongoing lexicographical project that investigates Polish loanwords from German that were further borrowed into the East Slavic languages Russian, Ukrainian, and Belorussian. The results will be published as three separate dictionaries in the Lehnwortportal Deutsch, a freely available web portal for loanword dictionaries having German as their common source language. On the database level, the portal models lexicographical data as a cross-resource directed acyclic graph of relations between individual words, including German ‘metalemmata’ as normalized representations of diasystemic variants of German etyma. Amongst other things, this technology makes it possible to use the web portal as an ‘inverted loanword dictionary’ to find loanwords in different languages borrowed from the same German etymon. The different possible pathways of German loanwords that went through Polish into the East Slavic languages can be represented directly as paths in the graph. A dedicated in-house dictionary editing software system assists lexicographers in producing and keeping track of these paths even in complex cases where, e.g, only a derivative of a German loanword in Polish has been borrowed into Russian. The paper concludes with some remarks on the particularities of the dictionary/portal access structure needed for presenting and searching borrowing chains.
Der Semantik-Band des Handbuchs der deutschen Konnektoren beschreibt erstmals umfassend die Bedeutung der deutschen Konnektoren und etabliert eine theoretisch begründete semantische Klassifikation dieser Satzverknüpfer, die auf der syntaktischen Klassifikation des ersten Bandes des Handbuchs von Pasch et al. (2003) aufbaut. Der Semantik-Band richtet sich in erster Linie an ein linguistisches Fachpublikum. Durch die Darstellung der spezifischen Gebrauchsbedingungen satzverknüpfender Einheiten ist es darüber hinaus für Bereiche relevant, in denen das Verfassen und Verstehen von Texten Thema ist, wie Deutsch als Fremdsprache, Deutschdidaktik, Computerlinguistik, Übersetzungswissenschaft und angewandte Sprachforschung.
Der Semantik-Band des Handbuchs der deutschen Konnektoren beschreibt erstmals umfassend die Bedeutung der deutschen Konnektoren und etabliert eine theoretisch begründete semantische Klassifikation dieser Satzverknüpfer, die auf der syntaktischen Klassifikation des ersten Bandes des Handbuchs von Pasch et al. (2003) aufbaut. Der Semantik-Band richtet sich in erster Linie an ein linguistisches Fachpublikum. Durch die Darstellung der spezifischen Gebrauchsbedingungen satzverknüpfender Einheiten ist es darüber hinaus für Bereiche relevant, in denen das Verfassen und Verstehen von Texten Thema ist, wie Deutsch als Fremdsprache, Deutschdidaktik, Computerlinguistik, Übersetzungswissenschaft und angewandte Sprachforschung.
Handlungsverstehen und Intentionszuschreibung in der Interaktion I: Intentionsbekundungen mit wollen
(2014)
Hugo-Moser-Stiftung
(2014)
Der Beitrag behandelt die Frage, inwiefern es sich bei den gegenwärtigen Russlanddeutschen (Erwachsenen und Jugendlichen der ersten Generation, Einwanderungswelle der 1990er Jahre aus Sprachinseln) um Re-Migranten handelt, welche Veränderungen in den Varietätenrepertoires stattfinden und welche Schwierigkeiten und Probleme, aber auch Vorteile sich durch diese spezifische Migrationskonfiguration für die zugewanderten Russlanddeutschen ergeben. Die besondere Situation der Re-Migration mit der spezifischen linguistisch-soziolinguistischen Problematik wird durch Beispiele aus dem aktuellen IDS-Projekt „Migrationslinguistik“ veranschaulicht. Einerseits liegen besondere varietätenlinguistische Konstellationen vor, die bei der russlanddeutschen Migrantenpopulation generationenspezifische Konturen aufweisen. Dadurch entstehen andererseits unikale linguistische Sprachkontaktbedingungen, die die sprachlich-kommunikative Integration und den Erhalt der Migrantensprache Russisch in besonderer Weise beeinflussen können.
Gegenstand des Aufsatzes sind Sätze mit so genannten inneren Objekten, das sind Akkusativobjekte, die im Wesentlichen intransitive Verben gelegentlich zu sich nehmen. Sie weisen die Besonderheit auf, dass das Objektsnomen und das Verb morphologisch, etymologisch und/oder semantisch miteinander verwandt sind. Aufgrund von Form- und vor allem Bedeutungsunterschieden lassen sich in beiden Sprachen verschiedene Gruppen von inneren Objekten ausmachen, die genauer beschrieben und unter sprachvergleichenden Gesichtspunkten betrachtet werden. Dazu werden u.a. die syntaktischen Eigenschaften von Sätzen mit inneren Objekten herangezogen. Einige auffallende sprachbezogene Unterschiede werden beschrieben, beispielsweise ist im Rumänischen bei einigen Verben ein präpositionaler Anschluss möglich, wo im Deutschen das innere Objekt ausschließlich im Akkusativ stehen kann. Sätze mit inneren Objekten können als ein Typ von Argumentstrukturmustern betrachtet werden. In diesem Sinne sind sie Form-Bedeutungs-Paare, deren Beziehungen untereinander innerhalb eines Konzepts von Familienähnlichkeiten dargestellt werden, wie man sie auch innerhalb anderer Cluster von Argumentstrukturmustern beobachten kann.
Komplexe Argumentstrukturen. Kontrastive Untersuchungen zum Deutschen, Rumänischen und Englischen
(2014)
Neben dem kanonischen Ausdruck der Argumentstruktur von Verben als Intransitiv- oder Transitivkonstruktion mit Nominal- oder Präpositionalphrasen können Argumente in vielfältiger Weise auch in komplexer, nicht-kanonischer Form realisiert werden. Solche Argumentstrukturen zeigen insbesondere im Sprachvergleich interessante Variationen, wie der vorliegende Band anhand von Studien zum Deutschen, Rumänischen und Englischen zeigt. Er versammelt kontrastive Arbeiten zur Alternation von sententialen und nominalen Subjekten, zu den Typen und Restriktionen von Resultativkonstruktionen, zu den Bedingungen des Auftretens innerer Objekte, zu Eigenschaften infiniter Formen und ihren Verwendungsbeschränkungen als Argumentausdrücke sowie zu den spezifischen Bedingungen der Ditransitiv-Alternation. Die aus verschiedenen theoretischen Perspektiven geschriebenen Arbeiten reflektieren dabei das Spannungsfeld zwischen lexikalischen Forderungen, konstruktionalen Idiosynkrasien und sprachübergreifenden oder sprachspezifischen strukturellen Restriktionen.
Konversationsanalyse: Elementare Interaktionsstrukturen am Beispiel der Bundespressekonferenz
(2014)
The variation of the strong genitive marker of the singular noun has been treated by diverse accounts. Still there is a consensus that it is to a large extent systematic but can be approached appropriately only if many heterogeneous factors are taken into account. Over thirty variables influencing this variation have been proposed. However, it is actually unclear how effective they can be, and above all, how they interact. In this paper, the potential influencing variables are evaluated statistically in a machine learning approach and modelled in decision trees in order to predict the genitive marking variants. Working with decision trees based exclusively on statistically significant data enables us to determine what combination of factors is decisive in the choice of a marking variant of a given noun. Consequently the variation factors can be assessed with respect to their explanatory power for corpus data and put in a hierarchized order.
Maximizing the potential of very large corpora: 50 years of big language data at IDS Mannheim
(2014)
Very large corpora have been built and used at the IDS since its foundation in 1964. They have been made available on the Internet since the beginning of the 90’s to currently over 30,000 researchers worldwide. The Institute provides the largest archive of written German (Deutsches Referenzkorpus, DeReKe) which has recently been extended to 24 billion words. DeReKe has been managed and analysed by engines known as COSMAS and afterwards COSMAS II, which is currently being replaced by a new, scalable analysis platform called KorAP. KorAP makes it possible to manage and analyse texts that are accompanied by multiple, potentially conflicting, grammatical and structural annotation layers, and is able to handle resources that are distributed across different, and possibly geographically distant, storage systems. The majority of texts in DeReKe are not licensed for free redistribution, hence, the COSMAS and KorAP systems offer technical solutions to facilitate research on very large corpora that are not available (and not suitable) for download. For the new KorAP system, it is also planned to provide sandboxed environments to support non-remote-API access “near the data” through which users can run their own analysis programs.
This contribution outlines a conceptual analysis of the dictionary-internal cross-reference structure in electronic dictionaries along the lines of Wiegand’s actional-theoretical text theory of print dictionaries. The discussion focuses on issues of XML-based data modeling, using the monolingual German online dictionary elexiko as a running example. The first part of the article demonstrates how Wiegand’s formal theory of mediostructure and its intricate nomenclature can be extended in a systematic and lexicographically justified way to cover the structure of the underlying lexicographical database of online dictionaries. The second part of the article applies the concepts developed to a more technical question, examining the extent to which cross-reference information can be stored and processed separately from the dictionary entry documents, e.g., in a relational database. The results are largely negative; in most real world cases, this leads to an unwanted duplication of XML-related structural information. The concluding third part briefly describes the strategy chosen for elexiko: mediostructural information is not externalized at all; cross-reference consistency checks are performed by a dictionary editing tool that takes advantage of a specialized XML database index and can easily be made more efficient and scalable by using a simple caching technique.
The methods utilized in the area of research into dictionary use are established research methods in the social sciences. After explicating the different steps of a typical empirical investigation, this article provides examples of how these different methods are used in various user studies conducted in the field of using online dictionaries. Thereby, different kinds of data collection (surveys as online questionnaires, log files and eye tracking) as well as different research design structures (for instance, ex-post-facto design or experimental design) are discussed.
Once a new word or a new meaning is added to a monolingual dictionary, the lexicographer is to provide a definition of this item. This paper focuses on the methodological challenges in writing such definitions. After a short discussion of the central terminology (method and definition), the article describes factors which inform this process: linguistic theories, linguistic and lexicographical methods, and types of definitions. Using the example of elexiko, a dictionary project of the Institute for the German language (IDS) in Mannheim, Germany, the paper finally showcases the compilation of definitions in a monolingual online dictionary of contemporary German.
Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.
The article investigates the conditions under which the w-relativizer was appears instead of the d-relativzer das in German relative clauses. Building on Wiese 2013, we argue that was constitutes the elsewhere case that applies when identification with the antecedent cannot be established by syntactic means via upward agreement with respect to phi-features. Corpuslinguistic results point to the conclusion that this is the case whenever there is no lexical nominal in the antecedent that, following Geach 1962 and Baker 2003, supplies a criterion of identity needed to establish sameness of reference between the antecedent and the relativizer.