Refine
Year of publication
- 2014 (64) (remove)
Document Type
- Article (43)
- Conference Proceeding (13)
- Part of a Book (5)
- Book (2)
- Part of Periodical (1)
Keywords
- Deutsch (24)
- Computerlinguistik (7)
- Korpus <Linguistik> (6)
- Syntax (6)
- Wörterbuch (5)
- Englisch (4)
- Gesprochene Sprache (4)
- Information Extraction (4)
- Natürliche Sprache (4)
- Soziale Wahrnehmung (4)
Publicationstate
- Veröffentlichungsversion (34)
- Zweitveröffentlichung (11)
- Postprint (7)
Reviewstate
- Peer-Review (64) (remove)
Publisher
- De Gruyter (4)
- Erich Schmidt Verlag (3)
- Sage (3)
- Schmidt (3)
- Universitätsverlag Hildesheim (3)
- EURAC Research (2)
- Nodus (2)
- de Gruyter (2)
- Association for Computational Linguistics (1)
- Benjamins (1)
Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction
(2014)
We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering.
We examine the task of separating types from brands in the food domain. Framing the problem as a ranking task, we convert simple textual features extracted from a domain-specific corpus into a ranker without the need of labeled training data. Such method should rank brands (e.g. sprite) higher than types (e.g. lemonade). Apart from that, we also exploit knowledge induced by semi-supervised graph-based clustering for two different purposes. On the one hand, we produce an auxiliary categorization of food items according to the Food Guide Pyramid, and assume that a food item is a type when it belongs to a category unlikely to contain brands. On the other hand, we directly model the task of brand detection using seeds provided by the output of the textual ranking features. We also harness Wikipedia articles as an additional knowledge source.
We report on the two systems we built for Task 1 of the German Sentiment Analysis Shared Task, the task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS). The first system is a rule-based system relying on a predicate lexicon specifying extraction rules for verbs, nouns and adjectives, while the second is a translation-based system that has been obtained with the help of the (English) MPQA corpus.
Eine syntaktische Besonderheit der kontinentalwestgermanischen Sprachen ist die Bildung satzfinaler Verbalkomplexe (" ... dass sie das Buch gelesen haben muss"), für die ein hohes Maß an sprach- bzw. dialektübergreifender und idiolektaler Verbstellungsvariation charakteristisch ist. Der niederdeutsche Verbalkomplex gilt in Überblicksdarstellungen als streng kopffinal, wobei bisher – anders als für niederländische und hochdeutsche (besonders: oberdeutsche) Mundarten – kaum empirische Studien vorliegen. Der Aufsatz präsentiert eine deskriptive Analyse des zweigliedrigen Verbalkomplexes im Märkisch-Brandenburgischen, dem südöstlichsten der niederdeutschen Dialektverbände.
Im Gegensatz zum Standarddeutschen und anderen niederdeutschen Mundarten wie dem Nordniederdeutschen, weist das Brandenburgische selbst bei nur zwei verbalen Elementen in der rechten Satzklammer Variation auf ("dass sie lesen kann/kann lesen"). Anhand von Tonaufnahmen aus dem bisher kaum erschlossenen DDR-Korpus wird folgenden Fragen nachgegangen: Welche Verbstellungsvarianten sind in welchen Syntagmen möglich bzw. werden präferiert? Welche Unterschiede bestehen zwischen Haupt- und Nebensatzkomplexen? Wie verhält sich der brandenburgische Verbalkomplex in Bezug auf nicht-verbale Intervenierer (sog. Verb Projection Raising)? Wie verhalten sich Modal- und andere infinitivregierende Verben unter Perfekteinbettung (d.h. in stddt. Ersatzinfinitivkontexten)?
Am Ende steht eine erste typologische Einordnung des brandenburgischen Verbalkomplexes im Vergleich mit anderen kontinentalwestgermanischen Varietäten, wobei sich areallinguistisch interessante Ähnlichkeiten mit dem südlich angrenzenden Ostmitteldeutschen zeigen.
Growing globalisation of the world draws attention to cultural differences between people from different countries or from different cultures within the countries. Notwithstanding the diversity of people’s worldviews, current cross-cultural research still faces the challenge of how to avoid ethnocentrism; comparing Western-driven phenomena with like variables across countries without checking their conceptual equivalence clearly is highly problematic. In the present article we argue that simple comparison of measurements (in the quantitative domain) or of semantic interpretations (in the qualitative domain) across cultures easily leads to inadequate results. Questionnaire items or text produced in interviews or via open-ended questions have culturally laden meanings and cannot be mapped onto the same semantic metric. We call the culture-specific space and relationship between variables or meanings a ’cultural metric’, that is a set of notions that are inter-related and that mutually specify each other’s meaning. We illustrate the problems and their possible solutions with examples from quantitative and qualitative research. The suggested methods allow to respect the semantic space of notions in cultures and language groups and the resulting similarities or differences between cultures can be better understood and interpreted.
Large classes at universities(> 1600 students) create their own challenges for teaching and learning. Audience feedback is lacking and fine tuning of lectures, courses and exam preparation to address individual needs is very difficult to achieve. At RWTH Aachen University, a course concept and a knowledge map learning tool aimed to support individual students to prepare for exams in information science through theme-based exercises were developed and evaluated. The tool was grounded in the notion of self-regul ated learning with the goal of enabling students to learn
independently.
Once a new word or a new meaning is added to a monolingual dictionary, the lexicographer is to provide a definition of this item. This paper focuses on the methodological challenges in writing such definitions. After a short discussion of the central terminology (method and definition), the article describes factors which inform this process: linguistic theories, linguistic and lexicographical methods, and types of definitions. Using the example of elexiko, a dictionary project of the Institute for the German language (IDS) in Mannheim, Germany, the paper finally showcases the compilation of definitions in a monolingual online dictionary of contemporary German.
Seit Jahrzehnten fordern zahlreiche Metalexikografen und Lexikografen immer wieder eine umfangreichere Beschäftigung mit Wörterbüchern im muttersprachlichen Deutschunterricht, auch in der gymnasialen Oberstufe. Trotzdem spielen die Wortschatzarbeit und der Umgang mit Wörterbüchern in Lehrplänen, Didaktiken und Lehrwerken in den meisten Fällen allenfalls eine marginale Rolle. Im Anschluss an eine überblicksartige Bestandsaufnahme dazu untersucht der vorliegende Beitrag, inwieweit elexiko, ein Onlinewörterbuch zur deutschen Gegenwartssprache, sinnvoll in den muttersprachlichen Deutschunterricht der Sekundarstufe II integriert werden könnte. Am Beispiel des Angabebereichs der Bedeutungserläuterung wird überprüft, ob Schüler der gymnasialen Oberstufe als Zielgruppe für elexiko infrage kommen und für welche linguistischen Themen sich die Wortschatzarbeit mit den semantischen Paraphrasen für elexiko anbietet.
Measuring the quality of metadata is only possible by assessing the quality of the underlying schema and the metadata instance. We propose some factors that are measurable automatically for metadata according to the CMD framework, taking into account the variability of schemas that can be defined in this framework. The factors include among others the number of elements, the (re-)use of reusable components, the number of filled in elements. The resulting score can serve as an indicator of the overall quality of the CMD instance, used for feedback to metadata providers or to provide an overview of the overall quality of metadata within a repository. The score is independent of specific schemas and generalizable. An overall assessment of harvested metadata is provided in form of statistical summaries and the distribution, based on a corpus of harvested metadata. The score is implemented in XQuery and can be used in tools, editors and repositories.