Refine
Year of publication
- 2016 (40) (remove)
Document Type
- Article (24)
- Conference Proceeding (10)
- Part of a Book (5)
- Book (1)
Has Fulltext
- yes (40)
Keywords
- Deutsch (15)
- Korpus <Linguistik> (8)
- Gesprochene Sprache (5)
- Französisch (4)
- Konversationsanalyse (4)
- Chatten <Kommunikation> (3)
- Component MetaData Infrastructure (CMDI) (3)
- Kontrastive Grammatik (3)
- Metadaten (3)
- Ukrainisch (3)
Publicationstate
- Veröffentlichungsversion (40) (remove)
Reviewstate
- Peer-Review (40) (remove)
Publisher
- Association for Computational Linguistics (3)
- European Language Resources Association (ELRA) (3)
- CLARIN (2)
- De Gruyter (2)
- Frontiers Media (2)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (2)
- Verlag für Gesprächsforschung (2)
- Academic Publishing Division of the Faculty of Arts of the University of Ljubljana (1)
- Austrian Centre for Digital Humanities, Austrian Academy of Sciences (1)
- Buro van die Wat (1)
The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.
Co-development of action, conceptualization and social interaction mutually scaffold and support each other within a virtuous feedback cycle in the development of human language in children. Within this framework, the purpose of this article is to bring together diverse but complementary accounts of research methods that jointly contribute to our understanding of cognitive development and in particular, language acquisition in robots. Thus, we include research pertaining to developmental robotics, cognitive science, psychology, linguistics and neuroscience, as well as practical computer science and engineering. The different studies are not at this stage all connected into a cohesive whole; rather, they are presented to illuminate the need for multiple different approaches that complement each other in the pursuit of understanding cognitive development in robots. Extensive experiments involving the humanoid robot iCub are reported, while human learning relevant to developmental robotics has also contributed useful results.
Disparate approaches are brought together via common underlying design principles. Without claiming to model human language acquisition directly, we are nonetheless inspired by analogous development in humans and consequently, our investigations include the parallel co-development of action, conceptualization and social interaction. Though these different approaches need to ultimately be integrated into a coherent, unified body of knowledge, progress is currently also being made by pursuing individual methods.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
We present an approach to the new task of opinion holder and target extraction on opinion compounds. Opinion compounds (e.g. user rating or victim support) are noun compounds whose head is an opinion noun. We do not only examine features known to be effective for noun compound analysis, such as paraphrases and semantic classes of heads and modifiers, but also propose novel features tailored to this new task. Among them, we examine paraphrases that jointly consider holders and targets, a verb detour in which noun heads are replaced by related verbs, a global head constraint allowing inferencing between different compounds, and the categorization of the sentiment view that the head conveys.
The Component MetaData Infrastructure (CMDI) is the dominant framework for describing language resources according to ISO 24622 (ISO/TC 37/SC 4, 2015). Within the CLARIN world, CMDI has become a huge success. The Virtual Language Observatory (VLO) now holds over 800.000 resources, all described with CMDI-based metadata. With the metadata being harvested from about thirty centres, there is a considerable amount of heterogeneity in the data. In part, there is some use of controlled vocabularies to keep data heterogeneity in check, say when describing the type of a resource, or the country the resource is originating from. However, when CMDI data refers to the names of persons or organisations, strings are used in a rather uncontrolled manner. Here, the CMDI community can learn from libraries and archives who maintain standardised lists for all kinds of names. In this paper, we advocate the use of freely available authority files that support the unique identification of persons, organisations, and more. The systematic use of authority records enhances the quality of the metadata, hence improves the faceted browsing experience in the VLO, and also prepares the sharing of CMDI-based metadata with the data in library catalogues.
The Component MetaData Infrastructure (CMDI) provides a lego-brick framework for the creation, use and re-use of self-defined metadata formats. The design of CMDI can be a force forgood, but history shows that it has often been misunderstood or badly executed. Consequently,it has led the community towards the dark ages of metadata clutter rather than the bright side of semantic interoperability. In this abstract, we report on the condition of CMDI but also outlinean agenda to make the CMDI world a better place to use, share and profit from metadata.
Der Begriff der „Gattung“ wird in der Soziologie und der Sprachwissenschaft als Sammelbegriff für verfestigte, (sprachlich) ähnliche Muster mit repetitiver Frequenz zur Lösung verwandter kommunikativer Probleme gefasst (z.B. unterschiedliche moralische Gattungen, vgl. Bergmann/Luckmann (Hg.) 1999). Wenig Aufmerksamkeit wurde bislang den Gemeinsamkeiten und Unterschieden – also den Abgrenzungsmöglichkeiten – von prototypischen zu weniger prototypischen Vertretern einzelner Gattungsfamilien zuteil. Im vorliegenden Beitrag beschreiben wir anhand von authentischen Daten die sogenannten „Gassigespräche“ als spontane Kommunikation des Alltags von Hundebesitzer/innen. Außerhalb der Sprachwissenschaft werden diese primär als Hyponym des Hyperonyms „Small Talk“ subsumiert. Wir versuchen zunächst unter gattungsanalytischen Gesichtspunkten die obligatorischen und fakultativen Einheiten um ein – sofern es denn überhaupt existiert – prototypisches Zentrum von Small-Talk zu gruppieren. Anhand eines paradigmatischen Falls beschreiben wir Gemeinsamkeiten und Unterschiede in Bezug auf andere Gattungen, die sich im Spektrum der Alltagsgespräche – oder auch darüber hinaus – ansiedeln. Wir plädieren in der Diskussion dafür, Gattungsfamilien als mehr oder weniger verfestigte Muster mit teils wiederkehrenden Merkmalen zu sehen, die ihre Eigenschaften in Form und Funktion teilen können.
This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.
Sentiment analysis has so far focused on the detection of explicit opinions. However, of late implicit opinions have received broader attention, the key idea being that the evaluation of an event type by a speaker depends on how the participants in the event are valued and how the event itself affects the participants. We present an annotation scheme for adding relevant information, couched in terms of so-called effect functors, to German lexical items. Our scheme synthesizes and extends previous proposals. We report on an inter-annotator agreement study. We also present results of a crowdsourcing experiment to test the utility of some known and some new functors for opinion inference where, unlike in previous work, subjects are asked to reason from event evaluation to participant evaluation.
Der vorliegende Aufsatz untersucht die Syntax und Semantik sogenannter Postponierer, d.h. konjunktionaler Konnektoren, die den von ihnen eingeleiteten Nebensatz dem Hauptsatz stets nachstellen. Anhand von sodass und zumal werden die Kerneigenschaften solcher Konnektoren im Deutschen vorgestellt. Am Beispiel der italienischen Konjunktionen cosicché, tanto più che und perché wird diskutiert, ob der Begriff des Postponierers für den Sprachvergleich genutzt werden kann. In einem nächsten Schritt werden die Postponierer des Deutschen unter Beiziehung sprachgeschichtlicher Argumente präziser beschrieben und im Übergangsfeld zwischen Adverbkonnektoren und Subjunktoren verortet. Es zeigt sich, dass die untersuchten Konnektoren sich letztlich sehr unterschiedlich verhalten, sodass es fraglich erscheint, ob ihre Zusammenfassung zu einer gemeinsamen Klasse gerechtfertigt ist.