Refine
Year of publication
- 2012 (44) (remove)
Document Type
- Conference Proceeding (21)
- Article (20)
- Part of a Book (2)
- Part of Periodical (1)
Has Fulltext
- yes (44)
Is part of the Bibliography
- no (44)
Keywords
- Computerlinguistik (9)
- Metadaten (7)
- Datenmanagement (5)
- Deutsch (5)
- Infrastruktur (5)
- Korpus <Linguistik> (5)
- Forschung (4)
- Information Extraction (4)
- Multimodalität (4)
- Natürliche Sprache (4)
Publicationstate
- Veröffentlichungsversion (31)
- Zweitveröffentlichung (9)
- Postprint (4)
Reviewstate
- Peer-Review (44) (remove)
Publisher
- European Language Resources Association (8)
- European Centre for Minority Issues (3)
- Association for Computational Linguistics (2)
- International Speech Communications Association (2)
- Springer (2)
- de Gruyter (2)
- Buske (1)
- CLARIN-D (1)
- Eigenverlag ÖGAI (1)
- European Language Resources Association (ELRA) (1)
We present a gold standard for semantic relation extraction in the food domain for German. The relation types that we address are motivated by scenarios for which IT applications present a commercial potential, such as virtual customer advice in which a virtual agent assists a customer in a supermarket in finding those products that satisfy their needs best. Moreover, we focus on those relation types that can be extracted from natural language text corpora, ideally content from the internet, such as web forums, that are easy to retrieve. A typical relation type that meets these requirements are pairs of food items that are usually consumed together. Such a relation type could be used by a virtual agent to suggest additional products available in a shop that would potentially complement the items a customer has already in their shopping cart. Our gold standard comprises structural data, i.e. relation tables, which encode relation instances. These tables are vital in order to evaluate natural language processing systems that extract those relations.
Creating and maintaining metadata for various kinds of resources requires appropriate tools to assist the user. The paper presents the metadata editor ProFormA for the creation and editing of CMDI (Component Metadata Infrastructure) metadata in web forms. This editor supports a number of CMDI profiles currently being provided for different types of resources. Since the editor is based on XForms and server-side processing, users can create and modify CMDI files in their standard browser without the need for further processing. Large parts of ProFormA are implemented as web services in order to reuse them in other contexts and programs.
This paper presents the system architecture as well as the underlying workflow of the Extensible Repository System of Digital Objects (ERDO) which has been developed for the sustainable archiving of language resources within the Tübingen CLARIN-D project. In contrast to other approaches focusing on archiving experts, the described workflow can be used by researchers without required knowledge in the field of long-term storage for transferring data from their local file systems into a persistent repository.
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
The paper’s purpose is to give an overview of the work on the Component Metadata Infrastructure (CMDI) that was implemented in the CLARIN research infrastructure. It explains, the underlying schema, the accompanying tools and services. It also describes the status and impact of the CMDI developments done within the CLARIN project and past and future collaborations with other projects.
In this paper, we examine methods to automatically extract domain-specific knowledge from the food domain from unlabeled natural language text. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. We also examine a combination of extraction methods and also consider relationships between different relation types. The extraction methods are applied both on a domain-specific corpus and the domain-independent factual knowledge base Wikipedia. Moreover, we examine an open-domain lexical ontology for suitability.
Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world.
We had found ourselves in the “Gutenberg-Galaxy” before the digitalization made its rise. The development of the book printing by Johannes Gutenberg and developments based on it as well as the following industrialization of printing are decisive for the expansion of the cultural revolution. It has meanwhile been transformed, upgraded and replaced by something which has been called “Turing Galaxy”. One of the most important changes is the automatic processing of data, the program-controlled production or manipulation of texts, images, sounds, formulas, tables and videos. The internet has led us to new distribution channels. The paper shows which trends of development concerning the cultural skills of writing and reading have been realized up to now as a result of the digitalization. Three aspects of development will be discussed: how the way of writing has changed to the present moment by the means of automation, multimodality and networking.
This paper deals with the constructional variation of emotion predicates in Estonian. It gives an overview on the constructional types, including information of their quantitative distribution. It is shown that one characteristic of Estonian is the formation of pairs of converses, i.e. pairs of emotion verbs, which have the same emotion semantics but different argument realisation patterns. These converses are based on derivational morphology such as the causative morphem –ta ‘CAUS’. Causative derivation has been adduced in the theoretical literature as support for the assumption that the cross-linguistically wide-spread constructional variation in emotion predicates has its origin in a difference of the causal structure in the verbal semantics. This paper shows that the data of Estonian contradicts this assumption.
In multimodal scholarly presentations supported by presentation software, spoken and written language, various visualizations on the projected slides as well as the contributors’ gestures and facial expressions build a meaningful oneness. On the one hand, communication scientists as well as linguists have for a relatively long time neglected the presentation as a complex form of communication. On the other hand, since Tafte (2003 ), columnists of major German newspapers have been dealing with the question of the value, the quality and the place of PowerPoint in science, they have even tried to find the answer to the question whether PowerPoint is evil or not.
The presentation practice is perceived as fundamentally deficient of systematic empirical research on presentations. Also Grabowski called attention to this desideratum with two critical articles (Grabowski 2003, 2008). Various questions - still unanswered - have motivated the implementation of a number of experiments (in the summer of 2010) for analyzing the knowledge and learning effects and the communicational impact of scientific presentations. The general aim of these experiments was to conduct empirical research on selected presentations in order to find out what kind of presentation is successful. The main interest is to find out which model of scholarly presentation produces the best results regarding learning effect and communicative impact.
This article deals with three interrelated phenoma in the information structure of German sentences: the focusing of negative markers, of finite verb forms and of the particles ja, doch, wohl and schon. Focusing of the finite verb is the most important marker of verum focus, as described by Höhle (1988). Focusing of particles can be an alternative means for similar purposes, while focusing of negation seems to be the contradictory opposite of verum focus. It is shown that negation- independently of its information structural status - can be interpreted on three distinct levels of sentence meaning: as an indicator of the non-facticity of a state of affairs, the non-truth of a proposition, or the non-desirability of a speech act. Focusing of the negative marker puts contrastive emphasis on the negative value assigned to sentence meaning on one of these levels. Ve rum focus can be interpreted on the same three levels: as a marker of contrastive emphasis on a positive value of facticity, truth or desirability. The particles ja, doch, wohl and schon refer to sufficient epistemic or interactional conditions for the assignment of a positive or negative value. By focusing such a particle, the speaker indicates that (s)he believes the assigned value to be well justified and insists on establishing it as common ground for further interaction.
Der Aufsatz ist ein empirischer und theoretischer Beitrag zur Weiterentwicklung einer multimodalen, interaktionsanalytischen Methodologie. Auf der Grundlage eines minimalen Kontrasts wird im Detail analysiert, wie zwei Konfirmandinnen und zwei Konfirmanden ihren jeweils gleichzeitigen "Kerzengang" in der Vor-phase eines Gottesdienstes realisieren. Während die Konfirmandinnen ihren Gang in den Altarraum, das Anzünden ihrer Kerzen und den Rückweg zur Bank als "gemeinsam gehen" koordinieren, realisieren die beiden Konfirmanden ihren Gang als "hinter jemandem herlaufen". Die Analyse wird theoretisch gerahmt durch das Konzept "Gehen als situierte Praktik", das im Anschluss weiter geschärft wird.
In this paper, we compare three different generalization methods for in-domain and cross-domain opinion holder extraction being simple unsupervised word clustering, an induction method inspired by distant supervision and the usage of lexical resources. The generalization methods are incorporated into diverse classifiers. We show that generalization causes significant improvements and that the impact of improvement depends on the type of classifier and on how much training and test data differ from each other. We also address the less common case of opinion holders being realized in patient position and suggest approaches including a novel (linguistically-informed) extraction method how to detect those opinion holders without labeled training data as standard datasets contain too few instances of this type.
This paper describes the ongoing work to integrate WebLicht into the CLARIN infrastructure. It introduces the CLARIN infrastructure for scholars in the humanities and social sciences as well as WebLicht - an orchestration and execution environment that is built upon Service Oriented Architecture principles. The integration of WebLicht into the CLARIN infrastructure involves adapting it to the standards and practices used within CLARIN, including distributed repositories, CMDI metadata, and persistent identifiers.
Knowledge Acquisition with Natural Language Processing in the Food Domain: Potential and Challenges
(2012)
In this paper, we present an outlook on the effectiveness of natural language processing (NLP) in extracting knowledge for the food domain. We identify potential scenarios that we think are particularly suitable for NLP techniques. As a source for extracting knowledge we will highlight the benefits of textual content from social media. Typical methods that we think would be suitable will be discussed. We will also address potential problems and limits that the application of NLP methods may yield.
This article discusses the situation of the Latgalian language in Latvia today. It first provides an overview of languages in Latvia, followed by a historical and contemporary sketch of the societal position of Latgalian and by an account of current Latgalian language activism. On this basis, the article then applies schemes of language functions and of evaluations of the societal position of minority languages to Latgalian. Given the range of functions that Latgalian fulfils today and the wishes and attempts by activists to expand these functions, the article argues that it is surprising that so little attention is given to Latgalian in mainstream Latvian and international sociolinguistic publications. In this light, the fate of the language is difficult to prognose, but a lot depends on whether the Latvian state will clarify its own unclear perception of policies towards Latgalian and on how much attention it will receive in the future.
Dieses Papier diskutiert informationsstrukturelle Aspekte der mehrfachen Vorfeldbesetzung im Deutschen. Auf der Grundlage einer größtenteils aus den IDS-Korpora extrahierten Belegsammlung werden Diskursgegebenheit, Fokus- und Topikstatus (vor allem) des Vorfeldmaterials beschrieben und in Bezug zu entsprechenden Aussagen in der Literatur gesetzt. Neben informationsstrukturellen Faktoren werden im letzten Abschnitt mögliche weitere Faktoren angesprochen, die mehrfache Vorfeldbesetzung favorisieren könnten. Zudem werden für einen begrenzten Ausschnitt des Deutschen erstmals Zahlen vorgelegt, die das Verhältnis von mehrfacher Vorfeldbesetzung zur ähnlichen, aber als „kanonischer“ geltenden Besetzung des Vorfelds mit einer (möglicherweise partiellen) Verbalphrase illustrieren.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.