TY - CHAP U1 - Buchbeitrag A1 - Klaus, Carsten A1 - Fankhauser, Peter A1 - Klakow, Dietrich ED - Sahle, Patrick T1 - OCR Nachkorrektur des Royal Society Corpus T2 - DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts. 6. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V., Frankfurt am Main, Mainz, 25.3.2019 – 29.3.2019 N2 - We present an approach for automatic detection and correction of OCR-induced misspellings in historical texts. The main objective is the post-correction of the digitized Royal Society Corpus, a set of historical documents from 1665 to 1869. Due to the aged material the OCR procedure has made mistakes, thus leading to files corrupted by thousands of misspellings. This motivates a post processing step. The current correction technique is a pattern-based approach which due to its lack of generalization suffers from bad recall. To generalize from the patterns we propose to use the noisy channel model. From the pattern based substitutions we train a corpus specific error model complemented with a language model. With an F1-Score of 0.61 the presented technique significantly outperforms the pattern based approach which has an F1-score of 0.28. Due to its more accurate error model it also outperforms other implementations of the noisy channel model. KW - OCR-Schrift KW - Korrektur KW - Automatische Sprachverarbeitung KW - Digital Humanities Y1 - 2019 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85353 UR - https://dhd2019.org/programm/do/postersession/poster-147/ UR - https://zenodo.org/record/2596095#.XKx0hqTgqUk U6 - https://dx.doi.org/10.5281/zenodo.2596095 DO - https://dx.doi.org/10.5281/zenodo.2596095 SP - 337 EP - 339 PB - Zenodo CY - Frankfurt am Main ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Klaus, Carsten A1 - Klakow, Dietrich A1 - Fankhauser, Peter T1 - OCR post-correction of the Royal Society Corpus based on the noisy channel model T2 - 41. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Universität Bremen, Rahmenthema: Kontrast und Opposition, 06. – 08. März 2019 KW - OCR-Schrift KW - Korrektur KW - Automatische Sprachverarbeitung Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85256 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85256 UR - www.dgfs2019.uni-bremen.de/pdf/DGfS 2019 Booklet.pdf SP - 301 EP - 301 PB - Deutsche Gesellschaft für Sprachwissenschaft CY - Bremen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Dembowski, Julia A1 - Wiegand, Michael A1 - Klakow, Dietrich ED - Vetulani, Zygmunt ED - Paroubek, Patrick T1 - Language Independent Named Entity Recognition using Distant Supervision T2 - Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of the 8th Language & Technology Conference, November 17-19, 2017, Poznań, Poland N2 - While good results have been achieved for named entity recognition (NER) in supervised settings, it remains a problem that for low resource languages and less studied domains little or no labelled data is available. As NER is a crucial preprocessing step for many natural language processing tasks, finding a way to overcome this deficit in data remains of great interest. We propose a distant supervision approach to NER that is both language and domain independent where we automatically generate labelled training data using gazetteers that we previously extracted from Wikipedia. We test our approach on English, German and Estonian data sets and contribute further by introducing several successful methods to reduce the noise in the generated training data. The tested models beat baseline systems and our results show that distant supervision can be a promising approach for NER when no labelled data is available. For the English model we also show that the distant supervision model is better at generalizing within the same domain of news texts by comparing it against a supervised model on a different test set. KW - Maschinelles Lernen KW - Information Extraction KW - Computerlinguistik KW - Text Mining KW - Name Y1 - 2017 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-86198 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-86198 UR - http://ltc.amu.edu.pl/a2017/ SN - 978-83-64864-94-0 SB - 978-83-64864-94-0 SP - 68 EP - 72 PB - Fundacja Uniwersytetu im. Adama Mickiewicza CY - Poznań ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Wiegand, Michael A1 - Klakow, Dietrich T1 - Detecting conditional healthiness of food items from natural language text JF - Language Resources and Evaluation N2 - In this article, we explore the feasibility of extracting suitable and unsuitable food items for particular health conditions from natural language text. We refer to this task as conditional healthiness classification. For that purpose, we annotate a corpus extracted from forum entries of a food-related website. We identify different relation types that hold between food items and health conditions going beyond a binary distinction of suitability and unsuitability and devise various supervised classifiers using different types of features. We examine the impact of different task-specific resources, such as a healthiness lexicon that lists the healthiness status of a food item and a sentiment lexicon. Moreover, we also consider task-specific linguistic features that disambiguate a context in which mentions of a food item and a health condition co-occur and compare them with standard features using bag of words, part-of-speech information and syntactic parses. We also investigate in how far individual food items and health conditions correlate with specific relation types and try to harness this information for classification. KW - Computerlinguistik KW - Information Extraction KW - Polarität KW - Lebensmittel KW - Natürliche Sprache KW - Text classification KW - Food domain KW - Social media KW - Linguistically informed feature engineering KW - Polarity classification Y1 - 2015 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85428 SN - 1574-0218 SS - 1574-0218 U6 - https://dx.doi.org/10.1007/s10579-015-9314-7 DO - https://dx.doi.org/10.1007/s10579-015-9314-7 N1 - This is a post-peer-review, pre-copyedit version of an article published in Language Resources and Evaluation. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10579-015-9314-7 VL - 49 IS - 4 SP - 777 EP - 830 PB - Springer CY - Dordrecht ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Roth, Benjamin A1 - Klakow, Dietrich ED - Biemann, Chris ED - Handschuh, Siegfried ED - Freitas, André ED - Meziane, Farid ED - Métais, Elisabeth T1 - Combining Pattern-Based and Distributional Similarity for Graph-Based Noun Categorization T2 - Natural Language Processing and Information Systems. Proceedings of the 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Passau, Germany, June 17–19, 2015 N2 - We examine the combination of pattern-based and distributional similarity for the induction of semantic categories. Pattern-based methods are precise and sparse while distributional methods have a higher recall. Given these particular properties we use the prediction of distributional methods as a back-off to pattern-based similarity. Since our pattern-based approach is embedded into a semi-supervised graph clustering algorithm, we also examine how distributional information is best added to that classifier. Our experiments are carried out on 5 different food categorization tasks. T3 - Lecture Notes in Computer Science - 9103 KW - Lebensmittel KW - Computerlinguistik KW - Maschinelles Lernen KW - Information Extraction KW - Grafische Darstellung KW - Food item KW - Neighbour classifier KW - Graph cluster KW - Relation extraction KW - Unconnected node Y1 - 2015 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-87479 SN - 978-3-319-19580-3 SB - 978-3-319-19580-3 U6 - https://dx.doi.org/10.1007/978-3-319-19581-0_5 DO - https://dx.doi.org/10.1007/978-3-319-19581-0_5 N1 - Dieser Beitrag ist aus urheberrechtlichen Gründen online nicht frei zugänglich. SP - 64 EP - 72 PB - Springer CY - Cham ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Roth, Benjamin A1 - Klakow, Dietrich T1 - Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction T2 - Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, April 26-30, 2014, Gothenburg, Sweden N2 - We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering. KW - Computerlinguistik KW - Korpus KW - Text Mining KW - Maschinelles Lernen KW - Lebensmittel Y1 - 2014 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84696 UR - https://aclanthology.info/papers/E14-1071/e14-1071 SN - 978-1-937284-78-7 SB - 978-1-937284-78-7 U6 - https://dx.doi.org/10.3115/v1/E14-1071 DO - https://dx.doi.org/10.3115/v1/E14-1071 SP - 673 EP - 682 PB - Association for Computational Linguistics CY - Stroudsburg, PA ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich T1 - Separating Brands from Types: an Investigation of Different Features for the Food Domain T2 - Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, August 23-29, 2014, Dublin, Ireland: Technical Papers N2 - We examine the task of separating types from brands in the food domain. Framing the problem as a ranking task, we convert simple textual features extracted from a domain-specific corpus into a ranker without the need of labeled training data. Such method should rank brands (e.g. sprite) higher than types (e.g. lemonade). Apart from that, we also exploit knowledge induced by semi-supervised graph-based clustering for two different purposes. On the one hand, we produce an auxiliary categorization of food items according to the Food Guide Pyramid, and assume that a food item is a type when it belongs to a category unlikely to contain brands. On the other hand, we directly model the task of brand detection using seeds provided by the output of the textual ranking features. We also harness Wikipedia articles as an additional knowledge source. KW - Computerlinguistik KW - Natürliche Sprache KW - Information Extraction KW - Maschinelles Lernen KW - Lebensmittel Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84874 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84874 UR - https://aclanthology.info/papers/C14-1216/c14-1216 SN - 978-1-941643-26-6 SB - 978-1-941643-26-6 SP - 2291 EP - 2302 PB - Dublin City University CY - Dublin ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Reiplinger, Melanie A1 - Wiegand, Michael A1 - Klakow, Dietrich ED - Przepiórkowski, Adam ED - Ogrodniczuk, Maciej T1 - Relation Extraction for the Food Domain without Labeled Training Data – Is Distant Supervision the Best Solution? T2 - Advances in Natural Language Processing. Proceedings of the 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014 N2 - We examine the task of relation extraction in the food domain by employing distant supervision. We focus on the extraction of two relations that are not only relevant to product recommendation in the food domain, but that also have significance in other domains, such as the fashion or electronics domain. In order to select suitable training data, we investigate various degrees of freedom. We consider three processing levels being argument level, sentence level and feature level. As external resources, we employ manually created surface patterns and semantic types on all these levels. We also explore in how far rule-based methods employing the same information are competitive. T3 - Lecture Notes in Artificial Intelligence - 8686 KW - Computerlinguistik KW - Information Extraction KW - Lebensmittel KW - Maschinelles Lernen KW - Natürliche Sprache KW - Food item KW - Surface pattern KW - Target relation KW - Relation extraction KW - Sentence level Y1 - 2014 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-87465 SN - 978-3-319-10887-2 SB - 978-3-319-10887-2 U6 - https://dx.doi.org/10.1007/978-3-319-10888-9_35 DO - https://dx.doi.org/10.1007/978-3-319-10888-9_35 N1 - Dieser Beitrag ist aus urheberrechtlichen Gründen online nicht frei zugänglich. SP - 345 EP - 357 PB - Springer CY - Cham ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Ruppenhofer, Josef A1 - Klakow, Dietrich T1 - Predicative Adjectives: An Unsupervised Criterion to Extract Subjective Adjectives T2 - Proceedings of HLT-NAACL 2013 N2 - We examine predicative adjectives as an unsupervised criterion to extract subjective adjectives. We do not only compare this criterion with a weakly supervised extraction method but also with gradable adjectives, i.e. another highly subjective subset of adjectives that can be extracted in an unsupervised fashion. In order to prove the robustness of this extraction method, we will evaluate the extraction with the help of two different state-of-the-art sentiment lexicons (as a gold standard). KW - predicative adjectives KW - gradable adjectives KW - separation of adjectives KW - Prädikatives Adjektiv KW - Semantische Analyse KW - Automatische Sprachanalyse Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52333 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-52333 SN - 978-1-937284-47-3 SB - 978-1-937284-47-3 SP - 534 EP - 539 PB - Association for Computational Linguistics CY - Atlanta ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich T1 - Towards the Detection of Reliable Food-Health Relationships T2 - Proceedings of the Workshop on Language Analysis in Social Media, 13 June 2013, Atlanta, Georgia N2 - We investigate the task of detecting reliable statements about food-health relationships from natural language texts. For that purpose, we created a specially annotated web corpus from forum entries discussing the healthiness of certain food items. We examine a set of task-specific features (mostly) based on linguistic insights that are instrumental in finding utterances that are commonly perceived as reliable. These features are incorporated in a supervised classifier and compared against standard features that are widely used for various tasks in natural language processing, such as bag of words, part-of speech and syntactic parse information. KW - Computerlinguistik KW - Natürliche Sprache KW - Information Extraction KW - Lebensmittel KW - Korpus Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84660 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84660 UR - http://www.anthology.aclweb.org/W/W13/#1100 SN - 978-1-937284-47-3 SB - 978-1-937284-47-3 SP - 69 EP - 79 PB - Association for Computational Linguistics CY - Stroudsburg, PA ER - TY - JOUR U1 - Zeitschriftenartikel, wissenschaftlich - begutachtet (reviewed) A1 - Wiegand, Michael A1 - Klenner, Manfred A1 - Klakow, Dietrich T1 - Bootstrapping polarity classifiers with rule-based classification JF - Language Resources and Evaluation N2 - In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rule-based classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge by training a supervised classifier with in-domain features, such as bag of words, on instances labeled by a rule-based classifier. Thus, this approach can be considered as a simple and effective method for domain adaptation. Among the list of components of this approach, we investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. In particular, the former addresses the issue in how far linguistic modeling is relevant for this task. We not only examine how this method performs under more difficult settings in which classes are not balanced and mixed reviews are included in the data set but also compare how this linguistically-driven method relates to state-of-the-art statistical domain adaptation. KW - Computerlinguistik KW - Polarität KW - Text Mining KW - Natürliche Sprache KW - Maschinelles Lernen KW - Polarity classification KW - Sentiment analysis KW - Bootstrapping methods KW - Feature engineering KW - Text classification Y1 - 2013 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84425 SN - 1574-0218 SS - 1574-0218 U6 - https://dx.doi.org/10.1007/s10579-013-9218-3 DO - https://dx.doi.org/10.1007/s10579-013-9218-3 N1 - This is a post-peer-review, pre-copyedit version of an article published in Language Resources and Evaluation. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10579-013-9218-3 VL - 47 IS - 4 SP - 1049 EP - 1088 PB - Springer CY - Dordrecht ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich T1 - Towards Contextual Healthiness Classification of Food Items - A Linguistic Approach T2 - Proceedings of the Sixth International Joint Conference on Natural Language Processing, October 14-18, 2013, Nagoya, Japan N2 - We explore the feasibility of contextual healthiness classification of food items. We present a detailed analysis of the linguistic phenomena that need to be taken into consideration for this task based on a specially annotated corpus extracted from web forum entries. For automatic classification, we compare a supervised classifier and rule-based classification. Beyond linguistically motivated features that include sentiment information we also consider the prior healthiness of food items. KW - Computerlinguistik KW - Information Extraction KW - Maschinelles Lernen KW - Lebensmittel KW - Natürliche Sprache Y1 - 2013 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85012 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85012 UR - https://aclanthology.info/papers/I13-1003/i13-1003 SN - 978-4-9907348-0-0 SB - 978-4-9907348-0-0 SP - 19 EP - 27 PB - Asian Federation of Natural Language Processing CY - Nagoya ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Roth, Benjamin A1 - Klakow, Dietrich ED - Cordier, Amélie ED - Nauer, Emmanuel T1 - Knowledge Acquisition with Natural Language Processing in the Food Domain: Potential and Challenges T2 - Proceedings of the Cooking with Computers workshop (CwC), August 28, 2012, Montpellier, France N2 - In this paper, we present an outlook on the effectiveness of natural language processing (NLP) in extracting knowledge for the food domain. We identify potential scenarios that we think are particularly suitable for NLP techniques. As a source for extracting knowledge we will highlight the benefits of textual content from social media. Typical methods that we think would be suitable will be discussed. We will also address potential problems and limits that the application of NLP methods may yield. KW - Lebensmittel KW - Natürliche Sprache KW - Information Extraction KW - Text Mining Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-86207 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-86207 UR - https://projet.liris.cnrs.fr/cwc/cwc2012/submissions.html SP - 46 EP - 51 PB - LIRMM CY - Montpellier ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Roth, Benjamin A1 - Klakow, Dietrich ED - Bouma, Gosse ED - Ittoo, Ashwin ED - Métais, Elisabeth ED - Wortmann, Hans T1 - Web-Based Relation Extraction for the Food Domain T2 - Natural Language Processing and Information Systems. Proceedings of the 17th International Conference on Applications of Natural Language to Information Systems, NLDB 2012, Groningen, The Netherlands, June 26-28, 2012 N2 - In this paper, we examine methods to extract different domain-specific relations from the food domain. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. As we need to process a large amount of unlabeled data our methods only require a low level of linguistic processing. This has also the advantage that these methods can provide responses in real time. T3 - Lecture Notes in Computer Science - 7337 KW - Computerlinguistik KW - Lebensmittel KW - Information Extraction KW - Natürliche Sprache KW - Food item KW - Relation type KW - Linguistic processing KW - Sparkling wine KW - Mean reciprocal rank Y1 - 2012 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-87454 SN - 978-3-642-31177-2 SB - 978-3-642-31177-2 U6 - https://dx.doi.org/10.1007/978-3-642-31178-9_25 DO - https://dx.doi.org/10.1007/978-3-642-31178-9_25 N1 - Dieser Beitrag ist aus urheberrechtlichen Gründen online nicht frei zugänglich. SP - 222 EP - 227 PB - Springer CY - Berlin [u.a.] ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich T1 - Generalization Methods for In-Domain and Cross-Domain Opinion Holder Extraction T2 - Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, April 23-27 2012, Avignon France N2 - In this paper, we compare three different generalization methods for in-domain and cross-domain opinion holder extraction being simple unsupervised word clustering, an induction method inspired by distant supervision and the usage of lexical resources. The generalization methods are incorporated into diverse classifiers. We show that generalization causes significant improvements and that the impact of improvement depends on the type of classifier and on how much training and test data differ from each other. We also address the less common case of opinion holders being realized in patient position and suggest approaches including a novel (linguistically-informed) extraction method how to detect those opinion holders without labeled training data as standard datasets contain too few instances of this type. KW - Computerlinguistik KW - Information Extraction KW - Natürliche Sprache KW - Maschinelles Lernen KW - Meinung KW - Sentimentanalyse Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84378 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84378 UR - https://dl.acm.org/citation.cfm?id=2380857 SN - 978-1-937284-19-0 SB - 978-1-937284-19-0 SP - 325 EP - 335 PB - Association for Computational Linguistics CY - Stroudsburg, PA ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Roth, Benjamin A1 - Lasarcyk, Eva A1 - Köser, Stephanie A1 - Klakow, Dietrich ED - Calzolari, Nicoletta ED - Choukri, Khalid ED - Declerck, Thierry ED - Uğur Doğan, Mehmet ED - Maegaard, Bente ED - Mariani, Joseph ED - Moreno, Asuncion ED - Odijk, Jan ED - Piperidis, Stelios T1 - A Gold Standard for Relation Extraction in the Food Domain T2 - Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), May 21-27, 2012, Istanbul, Turkey N2 - We present a gold standard for semantic relation extraction in the food domain for German. The relation types that we address are motivated by scenarios for which IT applications present a commercial potential, such as virtual customer advice in which a virtual agent assists a customer in a supermarket in finding those products that satisfy their needs best. Moreover, we focus on those relation types that can be extracted from natural language text corpora, ideally content from the internet, such as web forums, that are easy to retrieve. A typical relation type that meets these requirements are pairs of food items that are usually consumed together. Such a relation type could be used by a virtual agent to suggest additional products available in a shop that would potentially complement the items a customer has already in their shopping cart. Our gold standard comprises structural data, i.e. relation tables, which encode relation instances. These tables are vital in order to evaluate natural language processing systems that extract those relations. KW - Information Extraction KW - Computerlinguistik KW - Korpus KW - Natürliche Sprache KW - Lebensmittel KW - Food Domain KW - Information Extraction KW - Domain-specific Relation Extraction Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84454 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84454 UR - https://aclanthology.info/papers/L12-1018/l12-1018 SN - 978-2-9517408-7-7 SB - 978-2-9517408-7-7 SP - 507 EP - 514 PB - European Language Resources Association CY - Paris ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Roth, Benjamin A1 - Klakow, Dietrich ED - Jancsary, Jeremy T1 - Data-driven Knowledge Extraction for the Food Domain T2 - Proceedings of the 11th Conference on Natural Language Processing (KONVENS 2012). Empirical Methods in Natural Language Processing, September 19-21, 2012, Vienna, Austria N2 - In this paper, we examine methods to automatically extract domain-specific knowledge from the food domain from unlabeled natural language text. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. We also examine a combination of extraction methods and also consider relationships between different relation types. The extraction methods are applied both on a domain-specific corpus and the domain-independent factual knowledge base Wikipedia. Moreover, we examine an open-domain lexical ontology for suitability. T3 - Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligence (ÖGAI) - Band 5 KW - Information Extraction KW - Computerlinguistik KW - Korpus KW - Empirische Linguistik KW - Lebensmittel Y1 - 2012 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84529 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84529 UR - http://www.oegai.at/konvens2012/proceedings.shtml SN - 3-85027-005-X SB - 3-85027-005-X SP - 21 EP - 29 PB - Österreichische Gesellschaft für Artificial Intelligence CY - Wien ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich ED - Angelova, Galia ED - Bontcheva, Kalina ED - Mitkov, Ruslan ED - Nikolov, Nikolai T1 - Prototypical Opinion Holders: What We can Learn from Experts and Analysts T2 - Proceedings of the International Conference on Recent Advances in Natural Language Processing 2011, Hissar, Bulgaria, 12-14 September, 2011 N2 - In order to automatically extract opinion holders, we propose to harness the contexts of prototypical opinion holders, i.e. common nouns, such as experts or analysts, that describe particular groups of people whose profession or occupation is to form and express opinions towards specific items. We assess their effectiveness in supervised learning where these contexts are regarded as labelled training data and in rule-based classification which uses predicates that frequently co-occur with mentions of the prototypical opinion holders. Finally, we also examine in how far knowledge gained from these contexts can compensate the lack of large amounts of labeled training data in supervised learning by considering various amounts of actually labeled training sets. KW - Computerlinguistik KW - Maschinelles Lernen KW - Text Mining KW - Information Extraction KW - Sentimentanalyse KW - Expertenmeinung Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84674 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84674 UR - https://aclanthology.info/papers/R11-1039/r11-1039 SN - 1313-8502 SS - 1313-8502 SP - 282 EP - 288 PB - Incoma Ltd. CY - Shoumen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich T1 - The Role of Predicates in Opinion Holder Extraction T2 - Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition,16 September, 2011, Hissar, Bulgaria N2 - In this paper, we investigate the role of predicates in opinion holder extraction. We will examine the shape of these predicates, investigate what relationship they bear towards opinion holders, determine what resources are potentially useful for acquiring them, and point out limitations of an opinion holder extraction system based on these predicates. For this study, we will carry out an evaluation on a corpus annotated with opinion holders. Our insights are, in particular, important for situations in which no labelled training data are available and only rule-based methods can be applied. KW - Information Extraction KW - Computerlinguistik KW - Prädikat KW - Maschinelles Lernen KW - Natürliche Sprache KW - Sentimentanalyse Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84564 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-84564 UR - https://aclanthology.info/papers/W11-4004/w11-4004 SN - 978-954-452-018-2 SB - 978-954-452-018-2 SP - 13 EP - 20 PB - Incoma Ltd. CY - Shoumen ER - TY - CHAP U1 - Konferenzveröffentlichung A1 - Wiegand, Michael A1 - Klakow, Dietrich ED - Sandford Pedersen, Bolette ED - Nešpore, Gunta ED - Skadiņa, Inguna T1 - Convolution Kernels for Subjectivity Detection T2 - Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), May 11-13, 2011, Riga, Latvia N2 - In this paper, we explore different linguistic structures encoded as convolution kernels for the detection of subjective expressions. The advantage of convolution kernels is that complex structures can be directly provided to a classifier without deriving explicit features. The feature design for the detection of subjective expressions is fairly difficult and there currently exists no commonly accepted feature set. We consider various structures, such as constituency parse structures, dependency parse structures, and predicate-argument structures. In order to generalize from lexical information, we additionally augment these structures with clustering information and the task-specific knowledge of subjective words. The convolution kernels will be compared with a standard vector kernel. T3 - NEALT Proceedings Series - 11 KW - Computerlinguistik KW - Natürliche Sprache KW - Subjektivität KW - Maschinelles Lernen KW - Text Mining KW - Sentimentanalyse Y1 - 2011 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85032 UN - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:mh39-85032 SN - 1736-6305 SS - 1736-6305 SP - 254 EP - 261 PB - Northern European Association for Language Technology CY - Uppsala ER -