Refine
Year of publication
- 2014 (3) (remove)
Document Type
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Computerlinguistik (3)
- Lebensmittel (3)
- Maschinelles Lernen (3)
- Information Extraction (2)
- Natürliche Sprache (2)
- Food item (1)
- Korpus <Linguistik> (1)
- Relation extraction (1)
- Sentence level (1)
- Surface pattern (1)
Publicationstate
Reviewstate
- Peer-Review (2)
Publisher
We examine the task of relation extraction in the food domain by employing distant supervision. We focus on the extraction of two relations that are not only relevant to product recommendation in the food domain, but that also have significance in other domains, such as the fashion or electronics domain. In order to select suitable training data, we investigate various degrees of freedom. We consider three processing levels being argument level, sentence level and feature level. As external resources, we employ manually created surface patterns and semantic types on all these levels. We also explore in how far rule-based methods employing the same information are competitive.
We examine the task of separating types from brands in the food domain. Framing the problem as a ranking task, we convert simple textual features extracted from a domain-specific corpus into a ranker without the need of labeled training data. Such method should rank brands (e.g. sprite) higher than types (e.g. lemonade). Apart from that, we also exploit knowledge induced by semi-supervised graph-based clustering for two different purposes. On the one hand, we produce an auxiliary categorization of food items according to the Food Guide Pyramid, and assume that a food item is a type when it belongs to a category unlikely to contain brands. On the other hand, we directly model the task of brand detection using seeds provided by the output of the textual ranking features. We also harness Wikipedia articles as an additional knowledge source.
Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction
(2014)
We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering.