OPUS 4 | Search

205 search hits

111 to 120

Sort by

Putting corpora into perspective. Rethinking synchronicity in corpus linguistics (2010)

Belica, Cyril ; Keibel, Holger ; Kupietz, Marc ; Perkuhn, Rainer ; Vachková, Marie

Empirical synchronic language studies generally seek to investigate language phenomena for one point in time, even though this point in time is often not stated explicitly. Until today, surprisingly little research has addressed the implications of this time-dependency of synchronic research on the composition and analysis of data that are suitable for conducting such studies. Existing solutions and practices tend to be too general to meet the needs of all kinds of research questions. In this theoretical paper that is targeted at both corpus creators and corpus users, we propose to take a decidedly synchronic perspective on the relevant language data. Such a perspective may be realised either in terms of sampling criteria or in terms of analytical methods applied to the data. As a general approach for both realisations, we introduce and explore the FReD strategy (Frequency Relevance Decay) which models the relevance of language events from a synchronic perspective. This general strategy represents a whole family of synchronic perspectives that may be customised to meet the requirements imposed by the specific research questions and language domain under investigation.

Vorwort (2010)

Gunkel, Lutz ; Rijkhoff, Jan

Der Beitrag führt in das Themenheft der Zeitschrift Deutsche Sprache 2/2010 ein. Dieses Themenheft versammelt vier Beiträge zu einem zentralen Thema der deutschen Grammatik und Textlinguistik: der Form und Funktion von Attributionsstrukturen in der Nominalphrase. Gemeinsam ist allen Beiträgen der kontrastive und/oder funktional-typologische Zugang zu diesem Thema; Unterschiede bestehen in Bezug auf die untersuchten Attributtypen (Adjektiv-, Genitiv-, Präpositional- und Partizipialattribute), den methodischen Zugriff auf die Daten, die theoretischen Fragestellungen sowie die jeweiligen Vergleichssprachen (Niederländisch, Dänisch, Norwegisch, Englisch). Alle Beiträge dokumentieren das in den letzten Jahren wieder erstarkte Interesse an sprachvergleichenden Untersuchungen, das sich auch in entsprechenden themenspezifischen Konferenzen und Forschungsprojekten im In- und Ausland niederschlägt.

“die sendung hängt mir NACH, sie GEHT mir noch nach”. Autoreflexive Talkshows (2010)

Schütte, Wilfried

Auf dem Weg zu einem zweisprachigen Neologismenwörterbuch Deutsch – Russisch. Einige Fragen zur Konzeption (2010)

Steffens, Doris ; Nikitina, Olga

Different Views on Markup (2010)

Goecke, Daniela ; Lüngen, Harald ; Metzing, Dieter ; Stührenberg, Maik ; Witt, Andreas

In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.

The German Reference Corpus: New developments building on almost 50 years of experience (2010)

Kupietz, Marc ; Schonefeld, Oliver ; Witt, Andreas

This paper describes the efforts in the field of sustainability of the Institut für Deutsche Sprache (IDS) in Mannheim with respect to DEREKO (Deutsches Referenzkorpus) the Archive of General Reference Corpora of Contemporary Written German. With focus on re-usability and sustainability, we discuss its history and our future plans. We describe legal challenges related to the creation of a large and sustainable resource; sketch out the pipeline used to convert raw texts to the final corpus format and outline migration plans to TEI P5. Due to the fact, that the current version of the corpus management and query system is pushed towards its limits, we discuss the requirements for a new version which will be able to handle current and future DEREKO releases. Furthermore, we outline the institute’s plans in the field of digital preservation.

Mehrsprachigkeit und Identität: Vorstellung einer Integrationsarbeiterin (2010)

Meng, Katharina ; Protassova, Ekaterina

Im Mittelpunkt des Beitrags steht eine Person, deren Identität wesentlich durch ihre Mehrsprachigkeit und Mehrkulturalität unter Dominanz des Deutsch-Russischen und ihren Beruf als Integrationsarbeiterin geprägt ist. Die lebenslange Entwicklung dieser Identität und ihre interpersonalen Bedingungen werden auf der Grundlage von sprachbiografischen Interviews und Proben deutsch- und russischsprachiger Kommunikation rekonstruiert. Die Diskussion dieser Entwicklung nimmt Bezug auf aktuelle Fragen der Vermittlung früher kindlicher Zweisprachigkeit und der gesellschaftlichen Gestaltung der sprachlichen Integration von Zuwanderern.

Using a domain ontology for the semantic-statistical classification of specialist hypertexts (2010)

Schneider, Roman ; Bubenhofer, Noah

In this feasibility study we aim at contributing at the practical use of domain ontologies for hypertext classification by introducing an algorithm generating potential keywords. The algorithm uses structural markup information and lemmatized word lists as well as a domain ontology on linguistics. We present the calculation and ranking of keyword candidates based on ontology relationships, word position, frequency information, and statistical significance as evidenced by log-likelihood tests. Finally, the results of our machine-driven classification are validated empirically against manually assigned keywords.

Spotting, collecting and documenting negative polarity items (2010)

Soehn, Jan-Philipp ; Trawiński, Beata ; Lichte, Timm

As the nature of negative polarity items (NPIs) and their licensing contexts is still under much debate, a broad empirical basis is an important cornerstone to support further insights in this area of research. The work discussed in this paper is intended as a contribution to realizing this objective. The authors briefly introduce the phenomenon of NPIs and outline major theories about their licensing and also various licensing contexts before discussing our major topics: Firstly, a corpus-based retrieval method for NPI candidates is described that ranks the candidates according to their distributional dependence on the licensing contexts. Our method extracts single-word candidates and is extended to also capture multi-word candidates. The basic idea for automatically collecting NPI candidates from a large corpus is that an NPI behaves like a kind of collocate to its licensing contexts. Manual inspection and interpretation of the candidate lists identify the actual NPIs. Secondly, an online repository for NPIs and other items that show distributional idiosyncrasies is presented, which offers an empirical database for further (theoretical) research on these items in a sustainable way.

OWID - A dictionary net for corpus-based lexicography of contemporary German (2010)

Müller-Spitzer, Carolin

The Online-Wortschatz-Informationssystem Deutsch (OWID Online German Lexical Information System) is a lexicographic Internet portal for various electronic dictionary resources that are being compiled at the Institute for the German Language (Institut für Deutsche Sprache, IDS). The main emphasis of OWID is on academic lexicographic resources of contemporary German. Presently, the following dictionaries are included in OWID: a dictionary of contemporary German called elexiko, a dictionary of neologisms, a small dictionary of collocations, and a discourse dictionary covering the lexemes that establish the discourse about “guilt” in the early post-war era 1945-1955. In the near future (2010/2011), several additional dictionaries will be published in OWID: a Textbook of German Communication Verbs, a Valency Dictionary of German Verbs, two further discourse dictionaries – one about the “democracy” discourse around 1968, the other covering the keywords of the German reunification 1989/1990. Moreover, 300 entries from a corpus-based project on proverbs will be integrated into OWID. Thereby, OWID is a constantly growing resource for academic lexicographic work of the German language. Altogether, OWID is a special kind of dictionary portal owing to its content and its design, namely the integration of the various dictionaries, the access possibilities and the presentation features. With OWID, we try to establish a dictionary net where the different resources are jointly accessible not only by headwords, but also on the microstructural level. Prerequisite for these common access- and navigation-possibilities across the various dictionaries is the same concept for the lexicographic data model which we put into practice in OWID. Data from all dictionaries in OWID are structured according to a tailor-made, fine-granular, XML-based data model. In this data model, similar content is modelled similarly, dictionary related differences are preserved. The main tasks for the future are to enhance OWID with further dictionary resources, to improve the inner access structures so that they exhaust the possibilities of the data model, and to customize the layout of the dictionaries as well as the search options according to the user’s needs

111 to 120

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

205 search hits