400 Sprache, Linguistik
Refine
Document Type
- Part of a Book (5)
- Conference Proceeding (4)
- Article (1)
Has Fulltext
- yes (10)
Keywords
- Lexikon (10) (remove)
Publicationstate
- Veröffentlichungsversion (5)
- Zweitveröffentlichung (5)
- Postprint (2)
Reviewstate
- (Verlags)-Lektorat (5)
- Peer-Review (5)
Publisher
Constructionist approaches to grammar do not draw a clear distinction between lexicon and grammar, as generative "words and rules" accounts do. Rather, they conceptualize grammar and lexicon as a continuum of constructions of greater or lesser complexity and abstraction. In this paper, i explore the implications of this paradigm shift for the applied discipline of grammaticography. If we abandon the distinction between grammar and lexicon, should we also abandon the distinction between grammar, books and dictionaries? Drawing on a case study on the treatment of verbless constructions in the "IDS-Grammatik", it is argued that constructions should play a greater role in grammar books, but that grammar books still need to provide access to general principles of grammar.
In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.
Lexical resources are often represented in table form, e. g., in relational databases, or represented in specially marked up texts, for example, in document based XML models. This paper describes how it is possible to model lexical structures as graphs and how this model can be used to exploit existing lexical resources and even how different types of lexical resources can be combined.
Lexicography
(2008)
Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers and field linguists. A variety of lexicon schemas have been developed, with goals ranging from computational lexicography (DATR) through archiving (LIFT, TEI) to standardization (LMF, FSR). A number of requirements for lexicon schemas are given. The lexicon schemas are introduced and compared to each other in terms of conversion and usability for this particular user group, using a common lexicon entry and providing examples for each schema under consideration. The formats are assessed and the final recommendation is given for the potential users, namely to request standard compliance from the developers of the tools used. This paper should foster a discussion between authors of standards, lexicographers and field linguists.
The chapter on formats and models for lexicons deals with different available data formats of lexical resources. It elaborates on their structure and possible uses. Motivated by the restrictions in merging different lexical resources based on widely spread formalisms and international standards, a formal lexicon model for lexical resources is developed which is related to graph structures in annotations. For lexicons this model is termed the Lexicon Graph. Within this model the concepts of lexicon entries and lexical structures frequently described in the literature are formally defined and examples are given. The article addresses the problem of ambiguity in those formal terms. An implementation based on XML and XML technology such as XQuery for the defined structures is given. The relation to international standards is included as well.
We propose to use abusive emojis, such as the “middle finger” or “face vomiting”, as a proxy for learning a lexicon of abusive words. Since it represents extralinguistic information, a single emoji can co-occur with different forms of explicitly abusive utterances. We show that our approach generates a lexicon that offers the same performance in cross-domain classification of abusive microposts as the most advanced lexicon induction method. Such an approach, in contrast, is dependent on manually annotated seed words and expensive lexical resources for bootstrapping (e.g. WordNet). We demonstrate that the same emojis can also be effectively used in languages other than English. Finally, we also show that emojis can be exploited for classifying mentions of ambiguous words, such as “fuck” and “bitch”, into generally abusive and just profane usages.
Dieser Beitrag liefert eine Skizze eines gebrauchsbasierten integrativen soziokognitiven Modells des dynamischen Lexikons. Das Modell besteht aus drei Kernkomponenten: Handlungen in der aktuellen Sprachverwendung, kognitiven Prozessen und sozialen Prozessen. Die Komponenten des Modells werden zunächst einzeln beschrieben und dann zusammengefügt. Es wird gezeigt und anhand von zwei Beispielen illustriert, wie das Modell durch die systematische Beschreibung der Interaktion zwischen diesen Komponenten gleichzeitig Stabilität und Struktur sowie Variation und Wandel im Lexikon vorhersagt.
Alleviating pain is good and abandoning hope is bad. We instinctively understand how words like alleviate and abandon affect the polarity of a phrase, inverting or weakening it. When these words are content words, such as verbs, nouns, and adjectives, we refer to them as polarity shifters. Shifters are a frequent occurrence in human language and an important part of successfully modeling negation in sentiment analysis; yet research on negation modeling has focused almost exclusively on a small handful of closed-class negation words, such as not, no, and without. A major reason for this is that shifters are far more lexically diverse than negation words, but no resources exist to help identify them. We seek to remedy this lack of shifter resources by introducing a large lexicon of polarity shifters that covers English verbs, nouns, and adjectives. Creating the lexicon entirely by hand would be prohibitively expensive. Instead, we develop a bootstrapping approach that combines automatic classification with human verification to ensure the high quality of our lexicon while reducing annotation costs by over 70%. Our approach leverages a number of linguistic insights; while some features are based on textual patterns, others use semantic resources or syntactic relatedness. The created lexicon is evaluated both on a polarity shifter gold standard and on a polarity classification task.