Spotting, collecting and documenting negative polarity items
- As the nature of negative polarity items (NPIs) and their licensing contexts is still under much debate, a broad empirical basis is an important cornerstone to support further insights in this area of research. The work discussed in this paper is intended as a contribution to realizing this objective. The authors briefly introduce the phenomenon of NPIs and outline major theories about their licensing and also various licensing contexts before discussing our major topics: Firstly, a corpus-based retrieval method for NPI candidates is described that ranks the candidates according to their distributional dependence on the licensing contexts. Our method extracts single-word candidates and is extended to also capture multi-word candidates. The basic idea for automatically collecting NPI candidates from a large corpus is that an NPI behaves like a kind of collocate to its licensing contexts. Manual inspection and interpretation of the candidate lists identify the actual NPIs. Secondly, an online repository for NPIs and other items that show distributional idiosyncrasies is presented, which offers an empirical database for further (theoretical) research on these items in a sustainable way.
Author: | Jan-Philipp Soehn, Beata Trawiński, Timm Lichte |
---|---|
URN: | urn:nbn:de:bsz:mh39-34434 |
DOI: | https://doi.org/10.1007/s11049-011-9125-5 |
ISSN: | 1573-0859 |
Parent Title (English): | Natural Language and Linguistic Theory |
Document Type: | Article |
Language: | English |
Year of first Publication: | 2010 |
Date of Publication (online): | 2015/01/30 |
Publicationstate: | Postprint |
Reviewstate: | Peer-review |
Tag: | Corpus-based retrieval; Documentation; Empirical database; Polarity items; XML |
GND Keyword: | Deutsch; Englisch; Korpus <Linguistik>; Negativer Polaritätsausdruck |
Volume: | 28 |
Issue: | 4 |
First Page: | 931 |
Last Page: | 952 |
Note: | The final publication is available at Springer via http://dx.doi.org/10.1007/s11049-011-9125-5 |
DDC classes: | 400 Sprache / 410 Linguistik / 410 Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Licence (German): | Urheberrechtlich geschützt |