Refine
Year of publication
- 2012 (102) (remove)
Document Type
- Part of a Book (53)
- Conference Proceeding (24)
- Article (22)
- Book (1)
- Other (1)
- Part of Periodical (1)
Keywords
- Deutsch (31)
- Korpus <Linguistik> (18)
- Computerlinguistik (9)
- Konversationsanalyse (8)
- Englisch (7)
- Kontrastive Grammatik (7)
- Sprachpolitik (7)
- Metadaten (6)
- Annotation (5)
- Datenmanagement (5)
Publicationstate
- Veröffentlichungsversion (102) (remove)
Reviewstate
- (Verlags)-Lektorat (68)
- Peer-Review (31)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.
Dieses Papier diskutiert informationsstrukturelle Aspekte der mehrfachen Vorfeldbesetzung im Deutschen. Auf der Grundlage einer größtenteils aus den IDS-Korpora extrahierten Belegsammlung werden Diskursgegebenheit, Fokus- und Topikstatus (vor allem) des Vorfeldmaterials beschrieben und in Bezug zu entsprechenden Aussagen in der Literatur gesetzt. Neben informationsstrukturellen Faktoren werden im letzten Abschnitt mögliche weitere Faktoren angesprochen, die mehrfache Vorfeldbesetzung favorisieren könnten. Zudem werden für einen begrenzten Ausschnitt des Deutschen erstmals Zahlen vorgelegt, die das Verhältnis von mehrfacher Vorfeldbesetzung zur ähnlichen, aber als „kanonischer“ geltenden Besetzung des Vorfelds mit einer (möglicherweise partiellen) Verbalphrase illustrieren.
Der Aufsatz entwirft eine Zusammenschau der Verknüpfungseigenschaften der Satzkonnektoren des Deutschen und eine Terminologie für ihre Beschreibung. Zur Illustration dient eine Auswahl von 24 Kausal- und Konsekutivkonnektoren. In der ersten Hälfte geht es um semantische und syntaktische Eigenschaften sowie um Eigenschaften der Syntax-Semantik-Schnittstelle. In der zweiten Hälfte stehen diskurs- und informationsstrukturelle Eigenschaften im Vordergrund. Es zeigt sich, dass die beschriebenen Verknüpfungseigenschaften sich nicht beliebig miteinander kombinieren, sondern charakteristische Eigenschaftsprofile bilden, mit deren Hilfe sich fünf große Konnektorklassen definieren und als geordnetes Teilsystem der Grammatik darstellen lassen.
Der vorliegende Beitrag untersucht die grammatische Realisierung satzförmiger und satzwertiger Verbgruppen- und Satzadverbialia im Deutschen im Vergleich mit den romanischen Sprachen Italienisch und Portugiesisch (schwerpunktmäßig in der brasilianischen Varietät). Solche Adverbialia können formal recht unterschiedlich realisiert werden. Für das Deutsche sind finite, subjunktor-eingeleitete adverbiale Nebensätze typisch. Seltener sind uneingeleitete finite Nebensätze, Partizipialgruppen und durch eine Präposition eingeleitete Infinitivgruppen. In den romanischen Sprachen werden Gerundial-, Partizipial- und Infinitivgruppen deutlich häufiger als Adverbialia genutzt. Anders als im Deutschen können sie auch eigene Subjekte haben, wodurch sie finiten Nebensätzen ähnlicher werden.
Einleitung
(2012)
Proceeding from the central ideas of the papers contained in this volume, the closing article sets out to achieve a unified theory of the syntax and semantics of verum focus, to be illustrated for the sentence and clause types of present day German. In German, verum focus is realized phonologically by means of pitch accents on morphosyntactic exponents of various classes: finite verb forms, complementizers and subordinators, interrogative and relative phrases, and modal particles. In the first half of the article, these constituents - most of which reside in the left periphery of the sentence or clause - are shown to share the gramma-tical function of distinguishing between sentence moods and other categories of clauses. This observation gives rise to the assumption that verum focus should be explicable as contrastive focus on semantically distinctive features or components of sentence mood and clause type. In the second half of the article this assumption is spelt out for the sentence and clause types of German. We propose a universal semantic structure of sentence meaning which makes it possible to reduce the most typical cases of verum focus and their diverse contextual interpretations to highlighting the connection between the sentence/clause and its textual or dis-course environment. This connection is syntactically implemented by an element occupying the head position of CP: either a finite verb form or a complementizer/subordinator. Realizations of verum focus on prefield constituents in wh- and relative clauses are explained as phonetic remedies deployed when a connecting element in C° is missing. Focusing of modal particles in the middle field and of verb forms in the right periphery of the clause are shown to differ semantically from verum focus stricto sensu, although they have similar pragmatic effects. The theory is built exclusively on assumptions needed for independent reasons and dispenses with the problematic verum operator assumed in most traditional accounts.
This paper presents the application of the <tiger2/> format to various linguistic scenarios with the aim of making it the standard serialisation for the ISO 24615 [1] (SynAF) standard. After outlining the main characteristics of both the SynAF metamodel and the <tiger2/> format, as extended from the initial Tiger XML format [2], we show through a range of different language families how <tiger2/> covers a variety of constituency and dependency based analyses.
This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint metadata. Starting with an overview of the component metadata approach together with the related semantic interoperability tools and services as the ISOcat data category registry and the relation registry we explain the standardization plan and efforts for component metadata within ISO TC37/SC4. Finally, we present information about uptake and plans of the use of component metadata within the three mentioned linguistic and L&T communities.
The paper’s purpose is to give an overview of the work on the Component Metadata Infrastructure (CMDI) that was implemented in the CLARIN research infrastructure. It explains, the underlying schema, the accompanying tools and services. It also describes the status and impact of the CMDI developments done within the CLARIN project and past and future collaborations with other projects.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.