Refine
Year of publication
Document Type
- Conference Proceeding (20)
- Part of a Book (17)
- Article (4)
- Doctoral Thesis (1)
- Part of Periodical (1)
- Report (1)
- Working Paper (1)
Keywords
- Deutsch (21)
- Polnisch (16)
- Korpus <Linguistik> (15)
- Head-driven phrase structure grammar (12)
- HPSG (9)
- Englisch (7)
- Kontrastive Grammatik (5)
- Kontrastive Linguistik (5)
- Präposition (5)
- Distribution <Linguistik> (4)
Publicationstate
- Veröffentlichungsversion (27)
- Zweitveröffentlichung (5)
- Postprint (1)
Reviewstate
- (Verlags)-Lektorat (22)
- Peer-Review (11)
- Peer-review (1)
Publisher
- CSLI Publications (4)
- Peter Lang (4)
- IDS-Verlag (3)
- Association for Computational Linguistics (2)
- Schneider Verlag Hohengehren (2)
- Universität Tübingen (2)
- de Gruyter (2)
- ACL (1)
- Benjamins (1)
- Buske (1)
It is well known that the distribution of lexical and grammatical patterns is size- and register-sensitive (Biber 1986, and later publications). This fact alone presents a challenge to many corpus-oriented linguistic studies focusing on a single language. When it comes to cross-linguistic studies using corpora, the challenge becomes even greater due to the lack of high-quality multilingual corpora (Kupietz et al. 2020; Kupietz/Trawiński 2022), which are comparable with respect to the size and the register. That was the motivation for the creation of the European Reference Corpus EuReCo, an initiative started in 2013 at the Leibniz Institute for the German Language (IDS) together with several European partners (Kupietz et al. 2020). EuReCo is an emerging federated corpus, with large virtual comparable corpora across various languages and with an infrastructure supporting contrastive research. The core of the infrastructure is KorAP (Diewald et al. 2016), a scalable open-source platform supporting the analysis and visualisation of properties of texts annotated by multiple and potentially conflicting information layers, and supporting several corpus query languages. Until recently, EuReCo consisted of three monolingual subparts: the German Reference Corpus DeReKo (Kupietz et al. 2018), the Reference Corpus of Contemporary Romanian Language (Barbu Mititelu/Tufiş/Irimia 2018), and the Hungarian National Corpus (Váradi 2002). The goal of the present submission is twofold. On the one hand, it reports about the new component of EuReCo: a sample of the National Corpus of Polish (Przepiórkowski et al. 2010). On the other hand, it presents the results of a new pilot study using the newly extended EuReCo. This pilot study investigates selected Polish collocations involving light verbs and their prepositional / nominal complements (Fig. 1) and extends the collocation analyses of German, Romanian and Hungarian (Fig. 2) discussed in Kupietz/Trawiński (2022).
In many European languages, propositional arguments (PAs) can be realized as different types of structures. Cross-linguistically, complex structures with PAs show a systematic correlation between the strength of the semantic bond and the syntactic union (cf. Givón 2001; Wurmbrand/Lohninger 2023). Also, different languages show similarities with respect to the (lexical) licensing of different PAs (cf. Noonan 1985; Givón 2001; Cristofaro 2003 on different predicate types). However, on a more fine-grained level, a variation across languages can be observed both with respect to the syntactic-semantic properties of PAs as well as to their licensing and usage. This presentation takes a multi-contrastive view of different types of PAs as syntactic subjects and objects by looking at five European languages: EN, DE, IT, PL and HU. Our goal is to identify the parameters of variation in the clausal domain with PAs and by this to contribute to a better understanding of the individual language systems on the one hand and the nature of the linguistic variation in the clausal domain on the other hand. Phenomena and Methodology: We investigate the following types of PAs: direct object (DO) clauses (1), prepositional object (PO) clauses (2), subject clauses (3), and nominalizations (4, 5). Additionally, we discuss clause union phenomena (6, 7). The analyzed parameters include among others finiteness, linear position of the PA, (non) presence of a correlative element, (non) presence of a complementizer, lexical-semantic class of the embedding verb. The phenomena are analyzed based on corpus data (using mono- and multilingual corpora), experimental data (acceptability judgement surveys) or introspective data.
Polish żeby under negation
(2021)
The paper addresses two patterns in the distribution of complement clauses headed by the complementizer żeby in Polish related to the presence of sentential negation. It is argued that żeby-clauses with an obligatory negation in the matrix clause, licensed by epistemic verbs, can be treated in terms of negative polarity, with żeby defined as an n-word. Structures with żeby-clauses and an obligatory negation in the embedded clause, licensed by verbs of fear, are argued to be an instance of negative complementation, with żeby specified as a negative complementizer. A uniform lexicalist analysis within the framework of HPSG is provided, employing tools developed to account for Negative Concord in Polish.
This paper reports on recent developments within the European Reference Corpus EuReCo, an open initiative that aims at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the well-known shortcomings of other types of multilingual corpora such as parallel/translation corpora (shining-through effects, over-normalization, simplification, etc.) or web-based comparable corpora (covering only web material), EuReCo provides a unique linguistic resource offering new perspectives for fine-grained contrastive research on authentic cross-linguistic data, applications in translation studies and foreign language teaching and learning.
Dieser Aufsatz befasst sich mit pragmatischen Aspekten von Negationsanhebung (NA), die vor allem in Horn (1978) erörtert wurden, und mit performativischen Eigenschaften von NA-Konstruktionen, die ursprünglich in Prince (1976), vor allem mit Bezug auf französische Daten diskutiert wurden. Das Ziel ist, die Kernaussagen von Horn (1978) und Prince (1976) mit Korpusdaten im übereinzelsprachlichen Kontext zu validieren. Als Gegenstand der Untersuchung werden deutsche und polnische NA-Konstruktionen herangezogen und entsprechend zwei verschiedene monolinguale Korpora als Datenquelle benutzt.
Dieser Beitrag beschreibt die Motivation und Ziele hinter der Initiative Europäisches Referenzkorpus EuReCo. Ausgehend von den Desiderata, die sich aufgrund der Defizite verfügbarer Forschungsdaten wie monolinguale Korpora, Parallelkorpora und Vergleichskorpora für den Sprachvergleich ergeben, werden die bisherigen und die laufenden Arbeiten im Rahmen von EuReCo präsentiert und anhand vergleichender deutsch-rumänischer Kookkurrenzanalysen neue Perspektiven für kontrastive Korpuslinguistik, die die EuReCo-Initiative öffnet, skizziert.
Negation raising and mood. A corpus-based study of Polish sądzić ‘think’ and wierzyć ‘believe’
(2021)
The paper describes the distribution of two negation raising predicates in Polish: sądzić ‛think’ and wierzyć ‛believe’ in the National Corpus of Polish with a particular focus on their morphosyntax and the mood of their clausal complements. The aim was to examine whether there are any correlations between these two parameters, and to what extent negation raising with those verbs exhibits performative features (in terms of Prince, 1976). The results of the study support the performative approach to negation raising as per Prince (1976) only for cases with subjunctive complements. The corpus findings further imply that Polish negation raising predicates encode two different degrees of (un)certainty concerning the truth of the embedded proposition depending on the mood of their complements. Structures with indicative complements express weaker uncertainty than structures with subjunctive complements.
Mit diesem Papier wird die neue Online-Reihe IDSopen des Leibniz-Instituts für Deutsche Sprache konzeptuell aufgelegt. Die Reihe bietet Autor/-innen und Rezipient/-innen aus allen Bereichen der Linguistik eine moderne und offene Plattform für digitales Publizieren. Mit IDSopen steht eine zeitgemäße Publikationsumgebung zur Verfügung, die schwerpunktmäßig Arbeiten veröffentlicht, die auf Ressourcen des IDS beruhen und deren Verwendungsmöglichkeiten in besonderem Maße zeigen. Gleichzeitig zeichnet sich IDSopen durch eine Öffnung für unkonventionelle Publikationsformen und -formate aus. Transparente Begutachtungsprozesse gehören dabei genauso zum Profil der Reihe wie ein offener Erscheinungsturnus und das Ansprechen unterschiedlicher Zielgruppen. IDSopen verfolgt entlang der Leitlinien des IDS und der Leibniz-Gemeinschaft (vgl. LeibnizOpen) das Open-Access-Prinzip und veröffentlicht ausschließlich digital, ohne gedruckte Form (Online-only). Diese Maßnahmen haben das Ziel, kurze Veröffentlichungszeiten für Manuskripte zu ermöglichen, einen unbeschränkten und kostenlosen Zugang zu qualitäts-geprüfter wissenschaftlicher Information rund um die IDS-Ressourcen im Internet zu bieten und liquide Publikationsprozesse zu unterstützen.
Validating the Performativity Hypothesis to Neg-Raising using corpus data: Evidence from Polish
(2021)
Dieser Beitrag präsentiert die neue multilinguale Ressource CoMParS (Collection of Multilingual Parallel Sequences). CoMParS versteht sich als eine funktional-semantisch orientierte Datenbank von Parallelsequenzen des Deutschen und anderer europäischer Sprachen, in der alle Daten neben den sprachspezifischen und universellen (im Sinne von Universal Dependencies) morphosyntaktischen Annotationen auch nach sprachübergreifenden funktional-semantischen Informationen auf der neudefinierten Annotationsebene Functional Domains annotiert und auf mehreren Ebenen (auch ebenenübergreifend) miteinander verlinkt sind. CoMParS wird in TEI P5 XML kodiert und sowohl als monolinguale wie auch als multilinguale Sprachressource modelliert.
Der Beitrag beschreibt die Motivation und Ziele des Europäischen Referenzkorpus EuReCo, einer offenen Initiative, die darauf abzielt, dynamisch definierbare virtuelle vergleichbare Korpora auf der Grundlage bestehender nationaler, Referenz- oder anderer großer Korpora bereitzustellen und zu verwenden. Angesichts der bekannten Unzulänglichkeiten anderer Arten mehrsprachiger Korpora wie Parallel- bzw. Übersetzungskorpora oder rein webbasierte vergleichbare Korpora, stellt das EuReCo eine einzigartige linguistische Ressource dar, die neue Perspektiven für germanistische und vergleichende wie angewandte Korpuslinguistik, insbesondere im europäischen Kontext, eröffnet.
Dieser Beitrag gibt einen Überblick über CoDII, die Collection of Distributionally Idiosyncratic Items. CoDII ist eine elektronische Sammlung verschiedener Untergruppen lexikalischer Elemente, die sich durch idiosynkratische Distribution auszeichnen. Das bedeutet, dass sich die Verteilung dieser Lexeme im Text nicht alleine aufgrund ihrer syntaktischen Kategorie Vorhersagen lässt. Die Methoden, die in der Entwicklung von CoDII angewandt werden, greifen über traditionelle Fachgrenzen hinaus und umfassen Korpuslinguistik, Computerlinguistik, Phraseologie und theoretische Sprachwissenschaft. Ein wichtiger Schwerpunkt unserer Diskussion liegt auf der Darstellung, inwiefern die in CoDII gesammelten, annotierten und unter anderem mit Suchwerkzeugen abfragbaren Daten dazu beitragen können, die linguistische Theoriebildung durch die Bereitstellung sorgfältig aufbereiteter Datensammlungen bei der Überprüfung ihrer Datengrundlage zu unterstützen.
In recent years, the availability of large annotated and searchable corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel and interesting work using corpus-based methods to study the grammar of natural languages. However, a look at relevant current research on the grammar of the Germanic, Romance, and Slavic languages reveals a variety of different theoretical approaches and empirical foci, which can be traced back to different philological and linguistic traditions. Still, this current state of affairs should not be seen as an obstacle but as an ideal basis for a fruitful exchange of ideas between different research paradigms.
This paper argues that there is a correlation between functional and purely grammatical patterning in language, yet the nature of this correlation has to be explored. This claim is based on the results of a corpus-driven study of the Slavic aspect, drawing on the socalled Distributional Hypothesis. According to the East-West Theory of the Slavic aspect, there is a broad east-west isogloss dividing the Slavic languages into an eastern group and a western group. There are also two transitional zones in the north and south, which share some properties with each group (Dickey 2000; Barentsen 1998, 2008). The East-West Theory uses concepts of cognitive grammar such as totality and temporal definiteness, and is based on various parameters of aspectual usage in discourse, including contexts such as habituals, general factuals, historical (narrative) present, performatives, sequenced events in the past etc. The purpose of the above-mentioned study is to challenge the semantic approach to the Slavic aspect by comparing the perfective and imperfective verbal aspect on the basis of purely grammatical co-occurrence patterns (see also Janda & Lyashevskaya 2011). The study focused on three Slavic languages: Russian, which, following the East-West Theory, belongs to the eastern group, Czech, which belongs to the western group, and Polish, which is considered as transitional in its aspectual patterning.
This paper argues for using authentic data not only as an empirical basis for linguistic generalizations but also for exemplification purposes in monolingual and particularly in bi- and multilingual contrastive studies. It shows that parallel data extracted from the available parallel corpora can - after enrichment with semantic-functional information while maintaining the available contextual, register-related and linguistic information - serve as a perfect data source for multilingual exemplification. Moreover, the analysis of semantic-functionally equivalent parallel sequences allows the investigation and exemplification of similarities and differences in how different languages express similar meaning from both a semasiological and an onomasiological perspective.
Here we will present a graphical software tool called Morph Moulder (MoMo) for teaching the formal foundations of a language with a denotation in a domain of relational typed feature structures as used in Head-Driven Phrase Structure Grammar. With MoMo, students learn the properties of totally well-typed, sort resolved relational feature structures, the use of formal languages to describe typed feature structures and the notions of constraint satisfaction and models of grammars written in a formal language. MoMo was realized and conceived within the context of a set of courses in the format of web-based training, that focuses on the concept of typed feature structures in a curriculum in grammar formalisms and parsing. The formal language of MoMo amends the constraint language of TRALE (an implementation platform for HPSG grammars based on ALE) to accommodate the expressive power of HPSG.
CoMParS is a resource under construction in the context of the long-term project German Grammar in European Comparison (GDE) at the IDS Mannheim. The principal goal of GDE is to create a novel contrastive grammar of German against the background of other European languages. Alongside German, which is the central focus, the core languages for comparison are English, French, Hungarian and Polish, representing different typological classes. Unlike traditional contrastive grammars available for German, which usually cover language pairs and are based on formal grammatical categories, the new GDE grammar is developed in the spirit of functionalist typology. This implies that, instead of formal criteria, cognitively motivated functional domains in terms of Givón (1984) are used as tertia comparationis. The purpose of CoMParS is to document the empirical basis of the theoretical assumptions of GDE-V and to illustrate the otherwise rather abstract content of grammar books by as many as possible naturally occurring and adequately presented multilingual examples, including information on their use in specific contexts and registers. These examples come from existing parallel corpora, and our presentation will focus on the legal aspects and consequences of this choice of language data.
We present two collections of lexical items with idiosyncratic distribution. The collections document the behavior of German and English bound words (BW, such as English “headway”), i.e., words which can only occur in one expression (“make headway”). BWs are a problem for both general and idiomatic dictionaries since it is unclear whether they have an independent lexical status and to what extent the expressions in which they occur are typical idiomatic expressions. We propose a system which allows us to document the information about BWs from dictionaries and linguistic literature, together with corpus data and example queries for major text corpora. We present our data structure and point to other phraseologically oriented collections. We will also show differences between the German and the English collection.
This paper provides a lexicalist formal description of preposition-pronoun contraction (PPC) in Polish, using the theoretical framework of HPSG. Considering the behaviour of PPC with respect to the prosodic, categorial, syntactic and semantic properties, the assumption can be made that each PPC is a morphological unit with prepositional status. The crucial difference between a PPC and a typical preposition consists, besides the phonological form, in the valence properties. While a typical preposition realizes its complement externally via general constraints on phrase structure, the realization of a PPC argument is effected internally by virtue of its lexical entry. Here, we will provide the appropriate implicational lexical constraints that license both typical Ps and PPCs.
Analog zu dem verbalen Bereich weist auch die nominale Domäne eine Reihe semantisch-syntaktischer Regularitäten und Restriktionen bezüglich ihrer internen Struktur auf. Als signifikante Parallelen zwischen Nominalstrukturen und Sätzen gelten Kontroll-, Bindungs- als auch Passivisierungsverhalten. Der Schwerpunkt des in der letzten Phase des Projekts B8 des SFB 340 entwickelten Nominalphrasenfragments liegt auf der Ausarbeitung einer Analyse für komplexe Nominalstrukturen, der eine bestimmte Menge empirisch basierter Generalisierungen zugrunde liegen. Neben der Behandlung der Kongruenzphänomene innerhalb der Nominalphrase, steht die Beschreibung der Argumentstruktur deutscher Nomina im Mittelpunkt. Das Ziel ist, eine solche Analyse zu entwickeln, die ohne leere Elemente und Spuren innerhalb der NP auskommt und eine gute Basis für Erweiterungsmöglichkeiten darstellen kann. Das Papier soll einen Überblick über die empirischen und theoretischen Annahmen geben, die der hier vorgestellten Analyse zugrunde liegen und einige ausgewählte Phänomene aus dem Nominalphrasenbereich im HPSG-Formalismus zu skizzieren. Im ersten Abschnitt wird eine Taxonomie der deutschen Nomina hinsichtlich ihrer Valenzeigenschaften vorgestellt. Im Mittelpunkt des Interesses stehen dabei Nomina mit einer Argumentstruktur. Weiterhin werden die Genitiv-NPn behandelt. Diskutiert werden insbesondere der kategoriale Status und die syntaktische Funktion der pränominalen Genitive. In Kapitel Analyse wird eine HPSG-Analyse deutscher NPn vorgeschlagen, die der Implementierung der Nominalsyntax im Projekt B8 zugrunde liegt.
This paper provides a treatment of Polish Plural Comitative Constructions in the paradigm of HPSG in the tradition of Pollard and Sag (1994). Plural Comitative Constructions (PCCs) have previously been treated in terms of coordination, complementation and adjunction. The objective of this paper is to show that PCCs are neither instances of typical coordinate structures nor of typical complement or adjunct structures. It thus appears difficult to properly describe them by means of the standard principles of syntax and semantics. The analysis proposed in this paper accounts for the syntactic and semantic properties of PCCs in Polish by assuming an adjunction-based syntactic structure for PCCs, and by treating the indexical information provided by PCCs not as subject to any inheritance or composition, but as a result of applying a set of principles on number, gender and person resolution that also hold for ordinary coordinate structures.
In this paper, semantic aspects of P1N1P2 word sequences will be discussed. Based on syntactic analysis of Trawinski (2003), which assumes prepositions heading P1N1P2NP combinations to be able to raise and realize syntactically complements of their arguments, we will investigate whether semantic representation of these expressions can be considered as an instance of the combinatorics semantics. We will investigate three German PPs involving expressions under consideration with respect to two criteria of internal semantic regularity adopted from Sailer (2000) and we will observe that the discussed expressions are not uniform with regard to the semantic properties. While the logical form of some of them can be computed by means of ordinary translations and a set of standard derivational operations, the other require additional handling methods. However, there are approaches available within the HPSG paradigm that are suited to account for these data. Here, we will briefly present the external selection approach of Soehn (2003) and the phrasal lexical entries approach of Sailer (2000) and we will show how they interact with the syntactic approach of Trawinski (2003).
Many modern languages commonly use expressions that seem unpredictable regarding standard grammar regularities. Among these expressions, sequences consisting of a preposition, a noun, another preposition, and another noun are particularly frequent. The issue of these expressions, usually termed in linguistic literature as "complex prepositions", "phrasal prepositions" or "preposition-like word formations", can certainly be considered to be a cross-linguistic problem (On "complex prepositions" in German and in other languages see (Benes 1974), (Buscha 1984)}, (Lindqvist 1994), (Meibauer 1995), (Quirk and Mulholland 1964), (Wollmann 1996). In this paper, I will focus exclusively on German data, because they provide very explicit and convincing linguistic evidence which motivates and supports my approach. However, I assert that the analysis proposed here for German can also be applied to other languages such as Polish or English.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
In this paper, we will investigate a cross-linguistic phenomenon referred to as complex prepositions (CPs), which is a frequent type of multiword expressions (MWEs) in many languages. Based on empirical data, we will point out the problems of the traditional treatment of CPs as complex lexical categories, and, thus, propose an analysis using the formal paradigm of the HPSG in the tradition of (Pollard and Sag, 1994). Our objective is to provide an approach to CPs which (1) convincingly explains empirical data, (2) is consistent with the underlying formal framework and does not require any extensions or modification of the existing description apparatus, (3) is computationally tractable.
This paper focuses on aspects of the licensing of adverbial noun phrases (AdvNPs) in the HPSG grammar framework. In the first part, empirical issues will be discussed. A number of AdvNPs will be examined with respect to various linguistic phenomena in order to find out to what extent AdvNPs share syntactic and semantic properties with non-adverbial NPs. Based on empirical generalizations, a lexical constraint for licensing both AdvNPs and non-adverbial NPs will be provided. Further on, problems of structural licensing of phrases containing AdvNPs that arise within the standard HPSG framework of Pollard and Sag (1994) will be pointed out, and a possible solution will be proposed. The objective is to provide a constraint-based treatment of NPs which describes non-redundantly both their adverbial and non-adverbial usages. The analysis proposed in this paper applies lexical and phrasal implicational constraints and does not require any radical modifications or extensions of the standard HPSG geometry of Pollard and Sag (1994).
Since adverbial NPs have particularly high frequency and a wide spectrum of uses in inflectional languages such as Polish, we will take Polish data into consideration.
The present investigation targets the phenomenon commonly called control. Many languages including German and Polish employ non-finite clauses (besides finite clauses) as propositional complements. The subject of these complement clauses is left unexpressed and must generally be interpreted co-referentially with the subject or object of the matrix clause (subject or object control). However. there are also infinitive-selecting verbs that do not allow for a co- referential interpretation of the embedded subject - semantically, the embedded infinitives of these anti-control verbs are thus less dependent on or less unifiable with the matrix proposition. In Polish anti-control constructions, non-finite complements are overtly marked with the complementizer zeby, suggesting that they are structurally more complex (namely. containing a C-projection) than the non-finite complements in control constructions lacking zeby (modulo special contexts. viz. 'control switch'). In a comparative perspective, the paper brings corpuslinguistic and experimental evidence to bear on the question whether surface appearances notwithstanding, the infinitival complements of anti-control verbs in German should similarly be analyzed as truly sentential, i.e., C-headed structures.
This paper presents the current results of an ongoing research project on corpus distribution of prepositions and pronouns within Polish preposition-pronoun contractions. The goal of the project is to provide a quantitative description of Polish preposition-pronoun contractions taking into consideration morphosyntactic properties of their components. It is expected that the results will provide a basis for a revision of the traditionally assumed inflectional paradigms of Polish pronouns and, thus, for a possible remodeling of these paradigms. The results of corpus-based investigations of the distribution of prepositions within preposition-pronoun contractions can be used for grammar-theoretical and lexicographic purposes.
Der Aufsatz knüpft an die Diskussion zur Verwendung von formalen grammatischen Kategorien im Sprachvergleich an (vgl. insbesondere Haspelmath 2007, 2010a, b und Newmeyer 2007, 2010). Es wird dabei nicht danach gefragt, ob sprachübergreifende grammatische Kategorien (oder genauer gesagt Kategorienausprägungen) existieren oder nicht bzw. ob einzelsprachliche grammatische Kategorien im Sprachvergleich sinnvoll einsetzbar sind, sondern wie ähnlich bzw. unterschiedlich einzelsprachliche Kategorien bzw. Kategorisierungen sind. Das Ziel ist damit, eine Methode zur Messung des Äquivalenzgrades von grammatischen Kategorien in verschiedenen Sprachen zu präsentieren; dies wird am Beispiel des IMPERATIVS im Deutschen, Englischen, Polnischen und Tschechischen illustriert.
As the nature of negative polarity items (NPIs) and their licensing contexts is still under much debate, a broad empirical basis is an important cornerstone to support further insights in this area of research. The work discussed in this paper is intended as a contribution to realizing this objective. The authors briefly introduce the phenomenon of NPIs and outline major theories about their licensing and also various licensing contexts before discussing our major topics: Firstly, a corpus-based retrieval method for NPI candidates is described that ranks the candidates according to their distributional dependence on the licensing contexts. Our method extracts single-word candidates and is extended to also capture multi-word candidates. The basic idea for automatically collecting NPI candidates from a large corpus is that an NPI behaves like a kind of collocate to its licensing contexts. Manual inspection and interpretation of the candidate lists identify the actual NPIs. Secondly, an online repository for NPIs and other items that show distributional idiosyncrasies is presented, which offers an empirical database for further (theoretical) research on these items in a sustainable way.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
The authors present a multilingual electronic database of lexical items with idiosyncratic occurrence patterns. Currently, our database consists of: (1) a collection of 444 bound words in German; (2) a collection of 77 bound words in English; (3) a collection of 58 negative polarity items in Romanian; (4) a collection of 84 negative polarity items in German; and (5) a collection of 52 positive polarity items in German. The database is encoded in XML and is available via the Internet, offering dynamic and flexible access.
This paper presents three electronic collections of polarity items: (i) negative polarity items in Romanian, (ii) negative polarity items in German, and (iii) positive polarity items in German. The presented collections are a part of a linguistic resource on lexical units with highly idiosyncratic occurrence patterns. The motivation for collecting and documenting polarity items was to provide a solid empirical basis for linguistic investigations of these expressions. Our databe provides general information about the collected items, specifies their syntactic properties, and describes the environment that licenses a given item. For each licensing context, examples from various corpora and the Internet are introduced. Finally, the type of polarity (negative or positive) and the class (superstrong, strong, weak or open) associated with a given item is speci ed. Our database is encoded in XML and is available via the Internet, offering dynamic and exible access.
The authors describe two data sets submitted to the database of MWE evaluation resources: (1) cranberry expressions in English and (2) cranberry expressions in German. The first package contains a collection of 444 cranberry words in German (CWde.txt) and a collection of the corresponding cranberry expressions (CCde.txt). The second package consists of a collection of 77 cranberry words in English (CWen.txt) and a collection of the corresponding cranberry expressions (CCen.txt). The data included in these packages was extracted from the Collection of Distributionally Idiosyncratic Items (CoDII), an electronic linguistic resource of lexical items with idiosyncratic occurrence patterns. Each package contains a readme file, and can be downloaded from multiword.wiki.sourceforge.net/Resources.
This thesis deals with expressions consisting of two noun phrases connected by a comitative preposition, referred to as comitative constructions (CCs). It focuses on CCs in Polish, with some comparisons to other languages, and provides an analysis at the morphosyntax-semantics-pragmatics interface in the paradigm of Head-Driven Phrase Structure Grammar with the integrated model-theoretic semantic framework of Lexicalized Flexible Ty2. After postulating three different readings of Polish CCs: accompanitive, conjunctive and (open and closed) inclusive, a number of semantic phenomena are discussed which provide evidence for this classification. Further examination of the data shows that all CC types behave uniformly with regard to their syntactic properties but exhibit differences regarding agreement and person, number and gender resolution. These differences have previously been explained by syntactic stipulations. This thesis argues that a syntactic approach to CCs lacks real empirical motivation and it demonstrates that some of the existing analyses are problematic for a number of empirical and / or theoretical reasons. It further offers an alternative analysis based on the assumption that all CC types have a uniform, adjunctionbased syntactic structure, and that the crucial differences between them are semantic in nature, being triggered by the meaning of the comitative preposition. The core of the proposed semantic analysis are three different logical representations of the comitative preposition, whose truth conditions allow us to make the right predictions about the different behavior of the three CC types. All other lexical components of CCs, including plural pronouns, bear in each type of CC their customary forms and meanings. Implementing this idea in a constraint-based framework whose description language incorporates a formal semantic representation language, and modeling the morphosyntactic, semantic, pragmatic and referential properties of CCs within a single grammatical paradigm, we arrive at an analysis that accounts for these expressions in a very natural way.