Refine
Year of publication
- 2016 (67) (remove)
Document Type
- Conference Proceeding (37)
- Article (13)
- Part of a Book (12)
- Doctoral Thesis (3)
- Book (2)
Language
- English (67) (remove)
Has Fulltext
- yes (67)
Keywords
- Korpus <Linguistik> (20)
- Deutsch (15)
- Gesprochene Sprache (7)
- Computerlinguistik (5)
- Computerunterstützte Lexikographie (5)
- Automatische Sprachanalyse (4)
- Formale Semantik (4)
- Französisch (4)
- German (4)
- Textlinguistik (4)
Publicationstate
- Veröffentlichungsversion (67) (remove)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (13)
- Association for Computational Linguistics (5)
- Ruhr-Universität Bochum (5)
- Universität Potsdam (4)
- CLARIN (2)
- European Language Resources Association (2)
- Frontiers Media (2)
- International Speech Communication Association (2)
- Ivane Javakhishvili Tbilisi State University (2)
- Academic Publishing Division of the Faculty of Arts of the University of Ljubljana (1)
In conversation, interlocutors rarely leave long gaps between turns, suggesting that next speakers begin to plan their turns while listening to the previous speaker. The present experiment used analyses of speech onset latencies and eye-movements in a task-oriented dialogue paradigm to investigate when speakers start planning their responses. German speakers heard a confederate describe sets of objects in utterances that either ended in a noun [e.g., Ich habe eine Tür und ein Fahrrad (“I have a door and a bicycle”)] or a verb form [e.g., Ich habe eine Tür und ein Fahrrad besorgt (“I have gotten a door and a bicycle”)], while the presence or absence of the final verb either was or was not predictable from the preceding sentence structure. In response, participants had to name any unnamed objects they could see in their own displays with utterances such as Ich habe ein Ei (“I have an egg”). The results show that speakers begin to plan their turns as soon as sufficient information is available to do so, irrespective of further incoming words.
Having found their way onto the computer screens, comics soon branched into webcomics. These kept a lot of the characteristics of print comic books, but gradually adapted new unexplored modes of representation. Three relatively new ‘enhancements’ to the medium of comics are presented in this article: webcomics enhanced through the use of the infinite canvas, as proposed by Scott McCloud, those enhanced with videos and/or sound, and lastly those enhanced with interactive and ludic elements. All of the mentioned push the medium of comics into new waters, and by doing so they add new layers of meaning and modify their structure based on the make-up of the implemented features. Infinite canvas manages to lift some limitations of print comics without changing the overall feel too drastically, while animated and voiced webcomics, as well as interactive or game comics, have a much higher inclination to transgress into domains of other media and transform themselves in order to accommodate and integrate these novel foreign features.
In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.
The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
TripleA is a workshop series founded by linguists from the University of Tübingen and the University of Potsdam. Its aim is to provide a forum for semanticists doing fieldwork on understudied languages, and its focus is on languages from Africa, Asia, Australia and Oceania. The second TripleA workshop was held at the University of Potsdam, June 3-5, 2015.
This paper introduces the recently started DRuKoLA-project that aims at providing mechanisms to flexibly draw virtual comparable corpora from the German Reference Corpus DeReKo and the Reference Corpus of Contemporary Romanian Language CoRoLa in order to use these virtual corpora as empirical basis for contrastive linguistic research.
Brown clustering has been used to help increase parsing performance for morphologically rich languages. However, much of the work has focused on using clustering techniques to replace terminal nodes or as a feature for parsing. Instead, we choose to examine how effectively Brown clustering is for unlexicalized parsing by creating data-driven POS tagsets which are then used with the Berkeley parser. We investigate cluster sizes as well as on what information (e.g. words vs. lemmas) clustering will yield the best parser performance. Our results approach the current state of the art results for the German T¨uBa-D/Z treebank when using parser internal tagging.