Refine
Year of publication
- 2016 (169) (remove)
Document Type
- Article (59)
- Conference Proceeding (44)
- Part of a Book (43)
- Book (15)
- Working Paper (5)
- Doctoral Thesis (3)
Keywords
- Deutsch (64)
- Korpus <Linguistik> (31)
- Gesprochene Sprache (20)
- Konversationsanalyse (11)
- Wörterbuch (11)
- Computerunterstützte Lexikographie (10)
- Französisch (8)
- German (7)
- Computerlinguistik (6)
- Linguistik (6)
Publicationstate
- Veröffentlichungsversion (169) (remove)
Reviewstate
Publisher
The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.
Co-development of action, conceptualization and social interaction mutually scaffold and support each other within a virtuous feedback cycle in the development of human language in children. Within this framework, the purpose of this article is to bring together diverse but complementary accounts of research methods that jointly contribute to our understanding of cognitive development and in particular, language acquisition in robots. Thus, we include research pertaining to developmental robotics, cognitive science, psychology, linguistics and neuroscience, as well as practical computer science and engineering. The different studies are not at this stage all connected into a cohesive whole; rather, they are presented to illuminate the need for multiple different approaches that complement each other in the pursuit of understanding cognitive development in robots. Extensive experiments involving the humanoid robot iCub are reported, while human learning relevant to developmental robotics has also contributed useful results.
Disparate approaches are brought together via common underlying design principles. Without claiming to model human language acquisition directly, we are nonetheless inspired by analogous development in humans and consequently, our investigations include the parallel co-development of action, conceptualization and social interaction. Though these different approaches need to ultimately be integrated into a coherent, unified body of knowledge, progress is currently also being made by pursuing individual methods.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
The author presents a study using eye-tracking-while-reading data from participants reading German jurisdictional texts. I am particularly interested in nominalisations. It can be shown that nominalisations are read significantly longer than other nouns and that this effect is quite strong. Furthermore, the results suggest that nouns are read faster in reformulated texts. In the reformulations, nominalisations were transformed into verbal structures. Reformulations did not lead to increased processing times of verbal constructions but reformulated texts were read faster overall. Where appropriate, results are compared to a previous study of Hansen et al. (2006) using the same texts but other methodology and statistical analysis.
We examine different features and classifiers for the categorization of opinion words into actor and speaker view. To our knowledge, this is the first comprehensive work to address sentiment views on the word level taking into consideration opinion verbs, nouns and adjectives. We consider many high-level features requiring only few labeled training data. A detailed feature analysis produces linguistic insights into the nature of sentiment views. We also examine how far global constraints between different opinion words help to increase classification performance. Finally, we show that our (prior) word-level annotation correlates with contextual sentiment views.
We present an approach to the new task of opinion holder and target extraction on opinion compounds. Opinion compounds (e.g. user rating or victim support) are noun compounds whose head is an opinion noun. We do not only examine features known to be effective for noun compound analysis, such as paraphrases and semantic classes of heads and modifiers, but also propose novel features tailored to this new task. Among them, we examine paraphrases that jointly consider holders and targets, a verb detour in which noun heads are replaced by related verbs, a global head constraint allowing inferencing between different compounds, and the categorization of the sentiment view that the head conveys.
In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of German spoken data and an extended version of the STTS (Stuttgart Tübingen Tagset) which accounts for phenomena typically found in spontaneous spoken German. The GOLD standard was developed on the basis of the Research and Teaching Corpus of Spoken German, FOLK, and is, to our knowledge, the first such dataset based on a wide variety of spontaneous and authentic interaction types. It can be used as a basis for further development of language technology and corpus linguistic applications for German spoken language.