Computerlinguistik
Refine
Year of publication
Document Type
- Conference Proceeding (302)
- Part of a Book (126)
- Article (87)
- Book (26)
- Working Paper (16)
- Other (15)
- Report (11)
- Contribution to a Periodical (7)
- Doctoral Thesis (7)
- Master's Thesis (4)
Language
- English (422)
- German (186)
- Multiple languages (2)
- French (1)
Keywords
- Computerlinguistik (205)
- Korpus <Linguistik> (166)
- Annotation (78)
- Deutsch (76)
- Automatische Sprachanalyse (69)
- Forschungsdaten (50)
- Natürliche Sprache (49)
- Digital Humanities (42)
- Gesprochene Sprache (40)
- Maschinelles Lernen (33)
Publicationstate
- Veröffentlichungsversion (373)
- Zweitveröffentlichung (108)
- Postprint (55)
- Preprint (2)
- (Verlags)-Lektorat (1)
- Erstveröffentlichung (1)
Reviewstate
Publisher
- Association for Computational Linguistics (40)
- European Language Resources Association (32)
- de Gruyter (30)
- Springer (26)
- European Language Resources Association (ELRA) (23)
- Institut für Deutsche Sprache (21)
- Zenodo (17)
- Linköping University Electronic Press (13)
- The Association for Computational Linguistics (11)
- CLARIN (9)
A polarity-sensitive item (PSI), as traditionally defined, is an expression that is restricted to either an affirmative or negative context. PSIs like ‘lift a finger’ and ‘all the time in the world’ sub-serve discourse routines like understatement and emphasis. Lexical–semantic classes are increasingly invoked in descriptions of the properties of PSIs. Here, we use English corpus data and the tools of Frame Semantics (Fillmore, 1982, 1985) to explore Israel’s (2011) observation that the semantic role of a PSI determines how the expression fits into a contextually constructed scalar model. We focus on a class of exceptions implied by Israel’s model: cases in which a given PSI displays two countervailing patterns of polarity sensitivity, with attendant differences in scalar entailments. We offer a set of case studies of polaritysensitive expressions – including verbs of attraction and aversion like ‘can live without’, monetary units like ‘a red cent’, comparative adjectives and time-span adverbials – that demonstrate that the interpretation of a given PSI in a given polar context is based on multiple factors. These factors include the speaker’s perspective on and affective stance towards the described event, available inferences about causality and, perhaps most critically, particulars of the predication, including the verb or adjective’s frame membership, the presence or absence of an ability modal like can, the grammatical construction used and the range of contingencies evoked by the utterance.
We present a quantitative approach to disambiguating flat morphological analyses and producing more deeply structured analyses. Based on existing morphological segmentations, possible combinations of resulting word trees for the next level are filtered first by criteria of linguistic plausibility and then by weighting procedures based on the geometric mean. The frequencies for weighting are derived from three different sources (counts of morphs in a lexicon, counts of largest constituents in a lexicon, counts of token frequencies in a corpus) and can be used either to find the best analysis on the level of morphs or on the next higher constituent level. The evaluation shows that for this task corpus-based frequency counts are slightly superior to counts of lexical data.
This paper addresses the task of finding antecedents for locally uninstantiated arguments. To resolve such null instantiations, we develop a weakly supervised approach that investigates and combines a number of linguistically motivated strategies that are inspired by work on semantic role labeling and corefence resolution. The performance of the system is competitive with the current state-of-the-art supervised system.
Opinion Holder and Target Extraction for Verb-based Opinion Predicates – The Problem is Not Solved
(2015)
We offer a critical review of the current state of opinion role extraction involving opinion verbs. We argue that neither the currently available lexical resources nor the manually annotated text corpora are sufficient to appropriately study this task. We introduce a new corpus focusing on opinion roles of opinion verbs from the Subjectivity Lexicon and show potential benefits of this corpus. We also demonstrate that state-of-the-art classifiers perform rather poorly on this new dataset compared to the standard dataset for the task showing that there still remains significant research to be done.
This paper presents C-WEP, the Collection of Writing Errors by Professionals Writers of German. It currently consists of 245 sentences with grammatical errors. All sentences are taken from published texts. All authors are professional writers with high skill levels with respect to German, the genres, and the topics. The purpose of this collection is to provide seeds for more sophisticated writing support tools as only a very small proportion of those errors can be detected by state-of-the-art checkers. C-WEP is annotated on various levels and freely available.
The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as ß. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website.
In this paper, we report on an effort to develop a gold standard for the intensity ordering of subjective adjectives. Rather than pursue a complete order as produced by paying attention to the mean scores of human ratings only, we take into account to what extent assessors consistently rate pairs of adjectives relative to each other. We show that different available automatic methods for producing polar intensity scores produce results that correlate well with our gold standard, and discuss some conceptual questions surrounding the notion of polar intensity.
We study the influence of information structure on the salience of subjective expressions for human readers. Using an online survey tool, we conducted an experiment in which we asked users to rate main and relative clauses that contained either a single positive or negative or a neutral adjective. The statistical analysis of the data shows that subjective expressions are more prominent in main clauses where they are asserted than in relative clauses where they are presupposed. A corpus study suggests that speakers are sensitive to this differential salience in their production of subjective expressions.