Refine
Year of publication
- 2015 (48) (remove)
Document Type
- Conference Proceeding (29)
- Part of a Book (10)
- Article (8)
- Master's Thesis (1)
Language
- English (48) (remove)
Has Fulltext
- yes (48) (remove)
Keywords
- Korpus <Linguistik> (18)
- Deutsch (15)
- Annotation (11)
- Corpus annotation (6)
- Corpus linguistics (6)
- Corpus technology (6)
- Datenbanksystem (6)
- Computerlinguistik (5)
- Englisch (5)
- Large corpora (5)
Publicationstate
- Veröffentlichungsversion (48) (remove)
Reviewstate
Publisher
- Institut für Deutsche Sprache (9)
- Association for Computational Linguistics (2)
- German Society for Computational Linguistics & Language Technology (GSCL) (2)
- Gesellschaft für Sprachtechnologie and Computerlinguistik (2)
- International Phonetic Association (2)
- International Speech Communication Association (2)
- Lang (2)
- Linköping University Electronic Press, Linköpings universitet (2)
- Springer (2)
- The Association for Computational Linguistics (2)
In a previous article (Faaß et al., 2012), a first attempt was made at documenting and encoding morphemic units of two South African Bantu languages, i.e. Northern Sotho and Zulu, with the aim of describing and storing the morphemic units of these two languages in a single relational database, structured as a hierarchical ontology. As a follow-up, the current article describes the implementation of our part-of-speech ontology. We give a detailed description of the morphemes and categories contained in the database, highlighting the need and reasons for a flexible ontology which will provide for both language specific and general linguistic information. By giving a detailed account of the methodology for the population of the database, we provide linguists from other Bantu languages with a road map for extending the database to also include their languages of specialization.
Preface
(2015)
Russia, its languages and its ethnic groups are for many readers of English surprisingly unknown territory. Even among academics and researchers familiar with many ethnolinguistic situations around the globe, there prevails rather unsystematic and fragmented knowledge about Russia. This relates to both the micro level such as the individual situations of specific ethnic or linguistic groups, and to the macro level with regard to the entire interplay of linguistic practices, ideologies, laws, and other policies in Russia. In total, this lack of information about Russia stands in sharp contrast to the abundance of literature on ethnolinguistic situations, minority languages, language revitalization, and ideologies toward languages and multilingualism which has been published throughout the past decades.
To optimize the sharing and reuse of existing data, many funding organizations now require researchers to specify a management plan for research data. In such a plan, researchers are supposed to describe the entire life cycle of the research data they are going to produce, from data creation to formatting, interpretation, documentation, short-term storage, long-term archiving and data re-use. To support researchers with this task, we built DMPTY, a wizard that guides researchers through the essential aspects of managing data, elicits information from them, and finally, generates a document that can be further edited and linked to the original research proposal.
Hierarchical predictive coding has been identified as a possible unifying principle of brain function, and recent work in cognitive neuroscience has examined how it may be affected by age–related changes. Using language comprehension as a test case, the present study aimed to dissociate age-related changes in prediction generation versus internal model adaptation following a prediction error. Event-related brain potentials (ERPs) were measured in a group of older adults (60–81 years; n = 40) as they read sentences of the form “The opposite of black is white/yellow/nice.” Replicating previous work in young adults, results showed a target-related P300 for the expected antonym (“white”; an effect assumed to reflect a prediction match), and a graded N400 effect for the two incongruous conditions (i.e. a larger N400 amplitude for the incongruous continuation not related to the expected antonym, “nice,” versus the incongruous associated condition, “yellow”). These effects were followed by a late positivity, again with a larger amplitude in the incongruous non-associated versus incongruous associated condition. Analyses using linear mixed-effects models showed that the target-related P300 effect and the N400 effect for the incongruous non-associated condition were both modulated by age, thus suggesting that age-related changes affect both prediction generation and model adaptation. However, effects of age were outweighed by the interindividual variability of ERP responses, as reflected in the high proportion of variance captured by the inclusion of by-condition random slopes for participants and items. We thus argue that – at both a neurophysiological and a functional level – the notion of general differences between language processing in young and older adults may only be of limited use, and that future research should seek to better understand the causes of interindividual variability in the ERP responses of older adults and its relation to cognitive performance.
Preface
(2015)
In this contribution, we report on an effort to annotate German data with information relevant to opinion inference. Such information has previously been referred to as effect or couched in terms of eventevaluation functors. We extend the theory and present an extensive scheme that combines both approaches and thus extends the set of inference-relevant predicates. Using these guidelines to annotate 726 German synsets, we achieve good inter-annotator agreement.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of all linguistic theories that take social interaction involving language into account. However, determining communicative functions is a notoriously difficult task both for human interpreters and systems. It involves an interpretative process that integrates various sources of information. Existing work on communicative function classification comes from either dialogue act tagging where it is generally coarse grained concerning the feed- back phenomena or it is token-based and does not address the variety of forms that feed- back utterances can take. This paper introduces an annotation framework, the dataset and the related annotation campaign (involving 7 raters to annotate nearly 6000 utterances). We present its evaluation not merely in terms of inter-rater agreement but also in terms of usability of the resulting reference dataset both from a linguistic research perspective and from a more applicative viewpoint.
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.