OPUS 4 | Search

Grußwort/Welcome address (2018)

“To cleanse and at the same time enrich your mother tongue is the task of the brightest people.” With this quote Goethe, the famous German poet, seemed to have described the work of EFNIL today. But is our task really that easy? Do we “cleanse” our language by deleting superfluous elements? Do we not lose the rich abundance of a language in so doing? Or is Goethe asking for other languages to be prevented from influencing his mother tongue? Would this even be feasible in a globalised world? Rudi Carrell, a famous entertainer on German TV, once said: “When I came to Germany I only spoke English. But the German language contains so many English words nowadays that I am now fluent in German!” His opinion is probably shared by many people learning German. My daily job is to support around 100,000 schools abroad that offer German as a foreign language. We ask ourselves daily: which German language should we be offering young people today? The classical German of literature? Or practical German which will enable young people to join the workforce of many German companies worldwide? And most of all: how do we motivate young people to learn German? Or any other foreign language? Yes, English, French, German, Spanish – these languages are in competition in many schools. But the most important fact is: the benefit lies in learning a foreign language, no matter which. Because by learning a foreign language we start to understand foreign cultures and other people. And THAT is what matters.

Neologismen der neunziger Jahre. Vom Textkorpus zur Datenbank (2002)

Tellenbach, Elke

Historische Tiefe in der Sprachforschung (1987)

von Polenz, Peter

Ansprache des Präsidenten des Instituts für deutsche Sprache. 25 Jahre Institut für deutsche Sprache (1990)

Grosse, Siegfried

Eröffnung der Jahrestagung 1996 (1997)

Debus, Friedhelm

Transcription Bottleneck of Speech Corpus Exploitation (2009)

Brinckmann, Caren

While written corpora can be exploited without any linguistic annotations, speech corpora need at least a basic transcription to be of any use for linguistic research. The basic annotation of speech data usually consists of time-aligned orthographic transcriptions. To answer phonetic or phonological research questions, phonetic transcriptions are needed as well. However, manual annotation is very time-consuming and requires considerable skill and near-native competence. Therefore it can take years of speech corpus compilation and annotation before any analyses can be carried out. In this paper, approaches that address the transcription bottleneck of speech corpus exploitation are presented and discussed, including crowdsourcing the orthographic transcription, automatic phonetic alignment, and query-driven annotation. Currently, query-driven annotation and automatic phonetic alignment are being combined and applied in two speech research projects at the Institut für Deutsche Sprache (IDS), whereas crowdsourcing the orthographic transcription still awaits implementation.

Über Corpusgewinnung und Dokumentation im Mannheimer Institut für deutsche Sprache (1969)

Hellmann, Manfred W.

Außenstelle Bonn des Instituts für deutsche Sprache (1969)

Hellmann, Manfred W.

25 Jahre Institut für Deutsche Sprache (1989)

Das Projekt Wissen über Wörter / WiW des Instituts für Deutsche Sprache, Mannheim (2002)

Haß-Zumkehr, Ulrike ; Schnörch, Ulrich

Eröffnung der Konferenz und Vorstellung des Instituts / Hochsprachen und europäische Mehrsprachigkeit aus der Sicht des Instituts für Deutsche Sprache (IDS) (2002)

Stickel, Gerhard

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim (2017)

Lüngen, Harald

Kontrastiv-linguistische Projekte des Instituts für Deutsche Sprache in Mannheim (1975)

Stickel, Gerhard

Wissenschaftssprachen am IDS und an anderen Forschungseinrichtungen (2005)

Stickel, Gerhard

DeReWo: Korpusbasierte Wortformenliste. Technical Report IDS-KL-2009-02 (2009)

Perkuhn, Rainer ; Belica, Cyril ; Kupietz, Marc ; Keibel, Holger ; Hennig, Sophie

KorAP architecture – diving in the deep sea of corpus data (2016)

Diewald, Nils ; Hanl, Michael ; Margaretha, Eliza ; Bingel, Joachim ; Kupietz, Marc ; Bański, Piotr ; Witt, Andreas

KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP’s design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DEREKO for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.

Grammatik des Deutschen im europäischen Vergleich - ein Projekt des Instituts für Deutsche Sprache, Mannheim (2005)

Zifonun, Gisela

Das Projekt „Grammatik des Deutschen im europäischen Vergleich“, das derzeit am Institut für Deutsche Sprache in Mannheim durchgeführt wird, soll durch die Berücksichtigung sprachtypologischer und im europäischen Rahmen kontrastiver Einsichten einen innovativen Zugang zur Grammatik des Deutschen erschließen. Die Berücksichtigung dieser grammatischen Außenperspektive soll auch als Grundlegung für anwendungsbezogene Grammatiken im Bereich Deutsch als Fremdsprache genutzt werden können. Die Erkenntnis der „arealen Typologie“, daß viele europäische Sprachen, unabhängig von ihrer genetischen Zugehörigkeit, grammatische Gemeinsamkeiten aufweisen, kann das europäische Sprachenbewußtsein und damit die kulturelle Identität fördern; in diesen kulturpolitischen Kontext ist auch das IDS-Projekt zu stellen. Die Konzeption des Projekts mit den zentralen Beschreibungskategorien funktionale Domäne’ und ,Varianzparameter’ wird vorgestellt und an Phänomenen aus dem gegenwärtigen Arbeitsschwerpunkt „Grammatik des Nominals“ erläutert.

Analyse und Dokumentation gesprochener Sprache am IDS (2007)

Fiehler, Reinhard ; Schröder, Peter ; Wagener, Peter

Mannheim - Hauptstadt der deutschen Sprache. Präsentation am Goethe - Institut Paris, 20. Januar 2007 (2007)

Malchow-Tayebi, Barbara ; Perkuhn, Rainer

The German Reference Corpus: New developments building on almost 50 years of experience (2010)

Kupietz, Marc ; Schonefeld, Oliver ; Witt, Andreas

This paper describes the efforts in the field of sustainability of the Institut für Deutsche Sprache (IDS) in Mannheim with respect to DEREKO (Deutsches Referenzkorpus) the Archive of General Reference Corpora of Contemporary Written German. With focus on re-usability and sustainability, we discuss its history and our future plans. We describe legal challenges related to the creation of a large and sustainable resource; sketch out the pipeline used to convert raw texts to the final corpus format and outline migration plans to TEI P5. Due to the fact, that the current version of the corpus management and query system is pushed towards its limits, we discuss the requirements for a new version which will be able to handle current and future DEREKO releases. Furthermore, we outline the institute’s plans in the field of digital preservation.

The Morphosyntactic Annotation of DeReKo: Interpretation, Opportunities, and Pitfalls (2009)

Belica, Cyril ; Kupietz, Marc ; Witt, Andreas ; Lüngen, Harald

The paper discusses from various angles the morphosyntactic annotation of DeReKo, the Archive of General Reference Corpora of Contemporary Written German at the Institut für Deutsche Sprache (IDS), Mannheim. The paper is divided into two parts. The first part covers the practical and technical aspects of this endeavor. We present results from a recent evaluation of tools for the annotation of German text resources that have been applied to DeReKo. These tools include commercial products, especially Xerox' Finite State Tools and the Machinese products developed by the Finnish company Connexor Oy, as well as software for which academic licenses are available free of charge for academic institutions, e.g. Helmut Schmid's Tree Tagger. The second part focuses on the linguistic interpretability of the corpus annotations and more general methodological considerations concerning scientifically sound empirical linguistic research. The main challenge here is that unlike the texts themselves, the morphosyntactic annotations of DeReKo do not have the status of observed data; instead they constitute a theory and implementation-dependent interpretation. In addition, because of the enormous size of DeReKo, a systematic manual verification of the automatic annotations is not feasible. In consequence, the expected degree of inaccuracy is very high, particularly wherever linguistically challenging phenomena, such as lexical or grammatical variation, are concerned. Given these facts, a researcher using the annotations blindly will run the risk of not actually studying the language but rather the annotation tool or the theory behind it. The paper gives an overview of possible pitfalls and ways to circumvent them and discusses the opportunities offered by using annotations in corpus-based and corpus-driven grammatical research against the background of a scientifically sound methodology.

The New IDS Corpus Analysis Platform: Challenges and Prospects (2012)

Bański, Piotr ; Fischer, Peter M. ; Frick, Elena ; Ketzan, Erik ; Kupietz, Marc ; Schnober, Carsten ; Schonefeld, Oliver ; Witt, Andreas

The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.

Informationsinfrastrukturen am Institut für Deutsche Sprache (2011)

Witt, Andreas ; Schonefeld, Oliver

This paper describes the effort of the Institut für Deutsche Sprache (IDS), the central research institution for the German language, connected with Information and Communications Technology (ICT). Use of ICT in a language research institute is twofold. On the one hand, ICT provides basic services for researches to accomplish their daily work. On the other hand, several national and international institutions have a strong interest in ICT. Therefore, ICT can also be seen as an amplifier for language research. The first part of this paper reports on the activates of the IDS in internal and external ICT-related projects and initiatives. The second part describes a general strategy towards an ICT strategy that could be useful both for the IDS and other national language institutes. We think such a general strategy is necessary to create a strong foundation not only for the ICT-related projects, but as a basis for a modem research institute.

Wir können auch Hochdeutsch – Das Institut für Deutsche Sprache in Mannheim – ein Ort der Ideen (2015)

Trabold, Annette

Korpusanalytische Zugänge zu sprachlichem Usus (2008)

Belica, Cyril ; Steyer, Kathrin

CONTRIBUTIONS TO THE STUDY OF GERMAN USAGE A CORPUS-BASED APPROACH This paper outlines some basic assumptions and principles underlying the corpus linguistics research and some application domains at the Institute for German Language in Mannheim. We briefly address three complementary but closely related tasks: first, the acquisition of very large corpora, second, the research on statistical methods for automatically extracting information about associations between word configurations, and, third, meeting the challenge of understanding the explanatory power of such methods both in theoretical linguistics and in other fields such as second language acquisition or lexicography. We argue that a systematic statistical analysis of huge bodies of text can reveal substantial insights into the language usage und change, far beyond just collocational patterning.

Korpustechnologie am Institut für Deutsche Sprache (2005)

Perkuhn, Rainer ; Belica, Cyril ; al-Wadi, Doris ; Lauer, Meike ; Steyer, Kathrin ; Weiß, Christian

Das Deutsche Referenzkorpus DEREKO im Jubiläumsjahr 2014 (2014)

Lüngen, Harald ; Kupietz, Marc

Das elexiko-Portal: Ein neuer Zugang zu lexikografischen Arbeiten am Institut für Deutsche Sprache (2007)

Müller-Spitzer, Carolin

Das elexiko-Portal soll verschiedene lexikografische Projekte des IDS in einem Verbund zusammenführen und - soweit das die Inhalte zulassen - gemeinsame Recherchemöglichkeiten über verschiedene lexikografische Produkte hinweg bieten. In diesem Aufsatz geht es v. a. darum zu zeigen, wie die XML-basierte Modellierung für das Portal aufgebaut ist, um zum einen die Basis für diese flexiblen Zugriffsstrukturen zu legen und zum anderen der Verschiedenheit der beteiligten Projekte Rechnung zu tragen. Gleichzeitig werden Perspektiven für eine flexiblere Darstellung der Daten und für die zukünftige Weiterentwicklung von Recherchemöglichkeiten aufgezeigt.

Begrüßung und einführende Bemerkungen (2015)

Eichinger, Ludwig M.

Forschungsinfrastrukturen am IDS: Gegenwart und Zukunft (2014)

Schonefeld, Oliver ; Witt, Andreas

Die Bonner "Forschungsstelle für öffentlichen Sprachgebrauch" (F.ö.S.) 1964 - 1980 (2014)

Hellmann, Manfred W.

Schlechte und bessere Zeiten für das IDS (2014)

Stickel, Gerhard

Forschungsstelle Freiburg (2014)

Schwitalla, Johannes ; Berens, Franz-Josef

Historische Lexikografie zwischen Zettelkasten und Internet. Die Neubearbeitung des Deutschen Fremdwörterbuchs (DFWB) am Institut für Deutsche Sprache. Ein Werkstattbericht (2014)

Schmidt, Herbert