Refine
Year of publication
- 2015 (141) (remove)
Document Type
- Article (49)
- Part of a Book (49)
- Conference Proceeding (29)
- Book (9)
- Other (2)
- Working Paper (2)
- Master's Thesis (1)
Keywords
- Deutsch (56)
- Korpus <Linguistik> (25)
- Annotation (12)
- Verb (12)
- Computerunterstützte Lexikographie (9)
- Corpus linguistics (7)
- Englisch (7)
- Wörterbuch (7)
- Corpus annotation (6)
- Corpus technology (6)
Publicationstate
- Veröffentlichungsversion (141) (remove)
Reviewstate
Publisher
- Institut für Deutsche Sprache (35)
- de Gruyter (16)
- De Gruyter (4)
- Lang (4)
- Springer (3)
- Association for Computational Linguistics (2)
- German Society for Computational Linguistics & Language Technology (GSCL) (2)
- Gesellschaft für Sprachtechnologie and Computerlinguistik (2)
- International Phonetic Association (2)
- International Speech Communication Association (2)
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
Some 25 years ago, a large-scale repatriation of Russian Germans began. As a result, more than 2,5 million people that grew up in the USSR, Russia, or other post-Soviet states, became German citizens who had native or near-native command of the Russian language. The uncomfortable differences they exhibited in comparison to those who were supposed to accept them as equals, yet failed to do so, compelled them to search for self-designations that would accommodate their new identity and to bond together to form a new minority. The authors examine the attempts of Soviet/Russian Germans to redefine their ethnic identity in terms of not just blood but also language and culture, focusing on two particular cases: the use of the name Rusak in the internet forums of the repatriated immigrants; and the linguistic-cultural practices of the older generation of immigrants.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of all linguistic theories that take social interaction involving language into account. However, determining communicative functions is a notoriously difficult task both for human interpreters and systems. It involves an interpretative process that integrates various sources of information. Existing work on communicative function classification comes from either dialogue act tagging where it is generally coarse grained concerning the feed- back phenomena or it is token-based and does not address the variety of forms that feed- back utterances can take. This paper introduces an annotation framework, the dataset and the related annotation campaign (involving 7 raters to annotate nearly 6000 utterances). We present its evaluation not merely in terms of inter-rater agreement but also in terms of usability of the resulting reference dataset both from a linguistic research perspective and from a more applicative viewpoint.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of linguistic theories that take social interaction, involving language, into account. This paper introduces the corpora and datasets of a project scrutinizing this kind of feedback utterances in French. We present the genesis of the corpora (for a total of about 16 hours of transcribed and phone force-aligned speech) involved in the project. We introduce the resulting datasets and discuss how they are being used in on-going work with focus on the form-function relationship of conversational feedback. All the corpora created and the datasets produced in the framework of this project will be made available for research purposes.
Centering on German self-motion verbs, this paper demonstrates the advantages of free-sorting over creating and delineating word fields with more traditional methods. In particular, I draw a comparison to Snell-Hornby’s (1983) work on German descriptive verbs, which produces lexical fields with the help of dictionary entries, a thesaurus, a small corpus of written text and limited speaker feedback. While these methods have benefits, they are limited in their ability to represent the average organization of semantic fields in the mind of everyday speakers. Freesorting, by contrast, does not rely on academic resources, corpora or singular speaker judgments. In sorting, a group of informants creates visible sets of items according to perceived similarity. Psycholinguists have used the method to quantitatively explore the perception of color terms across cultures (c.f. Roberson et al. 2005). With a sufficiently large number of informants, one can generate lexical sorting data that is apt for cluster analysis, the results of which are represented by dendrograms. The experiment I conducted involved 33 school children from a middle class neighborhood in Braunschweig, Northern Germany. My experiment shows that Snell-Hornby’s (1983) representation of the self-motion field can be improved by integrating further dimensions of meaning, such as body-space relations and sound, that young speakers find salient in the grouping procedure.
In my article I argue the need for an existence of grammar in spoken language. It would have the same functions as the grammar of written language: describing and explaining the fundamental units of spoken language and their features, describing the composition of those units and their conjunction. The basic units in the grammar of spoken language can be named as: the sound, the word, the functional unit, the conversational turn and the conversation itself. Further the central characteristics of spoken language and their impact on grammar have to be taken into account. They are: the interactivity, the multimodality, the processabihty and the great variability. After displaying my concepts I discuss three alternative concepts of a grammar in spoken language: online-syntax, construction grammar and multimodal grammar. The article concludes by discussing the role of spoken language grammar in language and foreign language teaching.
Valenz im Fokus. Grammatische und lexikografische Studien. Festschrift für Jacqueline Kubczak
(2015)
Die Festschrift Valenz im Fokus: Grammatische und lexikografische Studien enthält zum einen die Beiträge des internationalen Kolloquiums „Valenz im Fokus“, das am 12. Juli 2013 im Institut für Deutsche Sprache in Mannheim zu Ehren von Jacqueline Kubczak veranstaltet wurde, zum anderen weitere Beiträge von Kollegen aus der ganzen Welt, die zum einen als elektronische Publikation während des Kolloquiums präsentiert wurden, zum anderen speziell für die Festschrift hinzukamen.