Refine
Year of publication
- 2015 (141) (remove)
Document Type
- Article (49)
- Part of a Book (49)
- Conference Proceeding (29)
- Book (9)
- Other (2)
- Working Paper (2)
- Master's Thesis (1)
Keywords
- Deutsch (56)
- Korpus <Linguistik> (25)
- Annotation (12)
- Verb (12)
- Computerunterstützte Lexikographie (9)
- Corpus linguistics (7)
- Englisch (7)
- Wörterbuch (7)
- Corpus annotation (6)
- Corpus technology (6)
Publicationstate
- Veröffentlichungsversion (141) (remove)
Reviewstate
Publisher
- Institut für Deutsche Sprache (35)
- de Gruyter (16)
- De Gruyter (4)
- Lang (4)
- Springer (3)
- Association for Computational Linguistics (2)
- German Society for Computational Linguistics & Language Technology (GSCL) (2)
- Gesellschaft für Sprachtechnologie and Computerlinguistik (2)
- International Phonetic Association (2)
- International Speech Communication Association (2)
In dem Beitrag werden Argumentstrukturmuster mit inneren Objekten genauer untersucht. Als innere Objekte werden Akkusativobjekte bezeichnet, die gelegentlich von normalerweise intransitiven Verben zu sich genommen werden und deren Objekts-Nomen mit dem Verb etymologisch, morphologisch und/oder semantisch verwandt ist. Das heißt, es handelt sich um Sätze wie Maria lachte ihr fröhliches Lachen, Alles geht seinen geordneten Gang oder Er kämpft einen aussichtslosen Kampf. Wie man an diesen wenigen Beispielsätzen bereits sehen kann, wird mit dem inneren Objekt etwas explizit zum Ausdruck gebracht, was bereits in der Verbbedeutung implizit enthalten bzw. angelegt ist, denn lachen bedeutet ja ‘Freude zum Ausdruck bringen, indem man ein Lachen von sich gibt’ und kämpfen heißt ‘einen Kampf führen, Kampfhandlungen vollziehen, sich mit jmdm. oder etw. auseinandersetzen’.
Opinion Holder and Target Extraction for Verb-based Opinion Predicates – The Problem is Not Solved
(2015)
We offer a critical review of the current state of opinion role extraction involving opinion verbs. We argue that neither the currently available lexical resources nor the manually annotated text corpora are sufficient to appropriately study this task. We introduce a new corpus focusing on opinion roles of opinion verbs from the Subjectivity Lexicon and show potential benefits of this corpus. We also demonstrate that state-of-the-art classifiers perform rather poorly on this new dataset compared to the standard dataset for the task showing that there still remains significant research to be done.
We present an approach for opinion role induction for verbal predicates. Our model rests on the assumption that opinion verbs can be divided into three different types where each type is associated with a characteristic mapping between semantic roles and opinion holders and targets. In several experiments, we demonstrate the relevance of those three categories for the task. We show that verbs can easily be categorized with semi-supervised graphbased clustering and some appropriate similarity metric. The seeds are obtained through linguistic diagnostics. We evaluate our approach against a new manually-compiled opinion role lexicon and perform in-context classification.
Formal learning in higher education creates its own challenges for didactics, teaching, technology, and organization. The growing need for well-educated employees requires new ideas and tools in education. Within the ROLE project, three personal learning environments based on ROLE technology were used to accompany “traditional” teaching and learning activities at universities. The test beds at the RWTH Aachen University in Germany, the School of Continuing Education of Shanghai Jiao Tong University in China, and the Uppsala University in Sweden differ in learning culture, the number of students and their individual background, synchronous versus distant learning, etc. The big range of test beds underlines the flexibility of ROLE technology. For each test bed, the learning scenario is presented and analyzed as well as the particular ROLE learning environment. The evaluation methods are described and the research results discussed in detail. The learned lessons provide an easy way to benefit from the ROLE research work which demonstrates the potential for new ideas based on flexible e-learning concepts and tools in “traditional” education.
This article reports on the on-going CoRoLa project, aiming at creating a reference corpus of contemporary Romanian (from 1945 onwards), opened for online free exploitation by researchers in linguistics and language processing, teachers of Romanian, students. We invest serious efforts in persuading large publishing houses and other owners of IPR on relevant language data to join us and contribute the project with selections of their text and speech repositories. The CoRoLa project is coordinated by two Computer Science institutes of the Romanian Academy, but enjoys cooperation of and consulting from professional linguists from other institutes of the Romanian Academy. We foresee a written component of the corpus of more than 500 million word forms, and a speech component of about 300 hours of recordings. The entire collection of texts (covering all functional styles of the language) will be pre-processed and annotated at several levels, and also documented with standardized metadata. The pre-processing includes cleaning the data and harmonising the diacritics, sentence splitting and tokenization. Annotation will include morpho-lexical tagging and lemmatization in the first stage, followed by syntactic, semantic and discourse annotation in a later stage.
To optimize the sharing and reuse of existing data, many funding organizations now require researchers to specify a management plan for research data. In such a plan, researchers are supposed to describe the entire life cycle of the research data they are going to produce, from data creation to formatting, interpretation, documentation, short-term storage, long-term archiving and data re-use. To support researchers with this task, we built DMPTY, a wizard that guides researchers through the essential aspects of managing data, elicits information from them, and finally, generates a document that can be further edited and linked to the original research proposal.
Wir können auch Hochdeutsch – Das Institut für Deutsche Sprache in Mannheim – ein Ort der Ideen
(2015)
In a project called "A Library of a Billion Words" we needed an implementation of the CTS protocol that is capable of handling a text collection containing at least 1 billion words. Because the existing solutions did not work for this scale or were still in development I started an implementation of the CTS protocol using methods that MySQL provides. Last year we published a paper that introduced a prototype with the core functionalities without being compliant with the specifications of CTS (Tiepmar et al., 2013). The purpose of this paper is to describe and evaluate the MySQL based implementation now that it is fulfilling the specifications version 5.0 rc.1 and mark it as finished and ready to use. Further information, online instances of CTS for all described datasets and binaries can be accessed via the projects website.