Refine
Year of publication
- 2010 (205) (remove)
Document Type
- Part of a Book (83)
- Article (55)
- Conference Proceeding (39)
- Book (13)
- Part of Periodical (5)
- Contribution to a Periodical (4)
- Other (2)
- Working Paper (2)
- Doctoral Thesis (1)
- Review (1)
Keywords
- Deutsch (88)
- Korpus <Linguistik> (23)
- Computerlinguistik (15)
- Konversationsanalyse (15)
- Interaktion (12)
- Semantik (9)
- Computerunterstützte Lexikographie (8)
- Sprachgeschichte (8)
- Wörterbuch (7)
- Annotation (6)
Publicationstate
- Veröffentlichungsversion (84)
- Postprint (17)
- Zweitveröffentlichung (17)
- (Verlags)-Lektorat (1)
- Preprint (1)
- Verlags-Lektorat (1)
Reviewstate
Publisher
- de Gruyter (29)
- Institut für Deutsche Sprache (22)
- Benjamins (10)
- Narr (10)
- European Language Resources Association (6)
- Lang (6)
- Winter (5)
- Association for Computational Linguistics (4)
- Springer (4)
- De Gruyter (3)
In der vorliegenden Arbeit wird mit ethnografischen, gesprächsanalytischen und gesprächsrhetorischen Methoden der kommunikative Sozialstil der "emanzipatorischen Migranten" untersucht. Ein wesentliches Kennzeichen dieses Milieus von Migranten der zweiten Generation ist, dass seine Akteure offensiv und provokativ mit Rassismen umgehen und sich nicht ethnisch (als "Türken", "Italiener", "Griechen" etc.) definieren. Des Weiteren betrachten sie - neben der dominanten Verwendung des Deutschen als gruppeninterner Kommunikationssprache - (deutschtürkisches) Code-switching und Code-mixing als wichtigen Ausdruck ihrer migrantischen Identität.
Da Potenziale und Konturen von Stilen erst im Kontrast eindeutig hervortreten, werden diese Befunde mit der kommunikativen Praxis einer anderen Sozialwelt von Migranten der zweiten Generation verglichen, derjenigen der "akademischen Europatürken". Hierbei zeigt sich, dass dieses sich ethnisch und als "Elite" der türkischen Migranten definierende Milieu moderat auf Diskriminierungen reagiert und deutsch-türkische Sprachvariation als Ausdruck von "Halbsprachigkeit" ablehnt.
This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.
Corpus-based identification and disambiguation of reading indicators for German nominalizations
(2010)
Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.
So far, comprehensive grammar descriptions of Northern Sotho have only been available in the form of prescriptive books aiming at teaching the language. This paper describes parts of the first morpho-syntactic description of Northern Sotho from a computational perspective (Faaß, 2010a). Such a description is necessary for implementing rule based, operational grammars. It is also essential for the annotation of training data to be utilised by statistical parsers. The work that we partially present here may hence provide a resource for computational processing of the language in order to proceed with producing linguistic representations beyond tagging, may it be chunking or parsing. The paper begins with describing significant Northern Sotho verbal morpho-syntactics (section 2). It is shown that the topology of the verb can be depicted as a slot system which may form the basis for computational processing (section 3). Note that the implementation of the described rules (section 4) and also coverage tests are ongoing processes upon that we will report in more detail at a later stage.
This chapter will present results of a linguistic landscape (LL) project in the regional centre of Rēzekne in the region of Latgale in Eastern Latvia. Latvia was de facto a part of the Soviet Union until 1991, and this has given it a highly multilingual society. In the essentially post-colonial situation since 1991, strict language policies have been in place, which aim to reverse the language shift from Russian, the dominant language of Soviet times, back to Latvian. Thus, the main interests of the research were how the complex pattern of multilingualism in Latvia is reflected in the LL; how people relate to current language legislation; and what motivations, attitudes and emotions inform their behaviour.
This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model. Using the lexicon extracted from the training corpus and lexicon from the available word list, we used two statistical taggers for comparison reasons. The best result achieved was 93.1% accuracy in a 10-fold cross validation on the training set. The winning tagger was thereafter applied to annotate a 570,247 token corpus.
Nach einem kurzen Überblick über die heutige sprachökologische Situation in Lettland möchte ich dabei auf die Rolle der deutschen Sprache in verschiedenen Bereichen der lettischen Gesellschaft eingehen. Komplettiert wird der Überblick über die deutsche Sprache im heutigen Lettland durch einige Überlegungen zu Maßnahmen, die die Situation zugunsten des Deutschen ändern könnten.
L’équipe de Lyon étudie la façon dont les ressources plurilingues sont mobilisées dans des activités collaboratives au sein du travail d’équipe. La démarche analytique est inspirée de l’Analyse Conversationnelle d’emprunte ethnomethodologique, et considère comme centrale la relation entre ressources plurilingues et organisation située des usages linguistiques et des pratiques sociales. Ces deux aspects sont réflexivement articulés, les ressources plurilingues étant modelées par leur contexte d’utilisation, et les activités étant mutuellement contraintes et structurées par les ressources disponibles.
Dieser Artikel analysiert am Beispiel eines Racletteessens unter Freunden, wie innerhalb einer langen Sequenz das Warten auf den Beginn des Essens strukturiert wird. Während der fast 50 Minuten, die zwischen der Ankunft der ersten Gäste sowie dem Beginn des Essens vergehen, orientieren sich die Teilnehmer auf unterschiedliche Weise zum Warten als Aktivität. Das sukzessive Eintreffen der Gäste führt jeweils zu Eröffnungssequenzen innerhalb dieser Wartezeit. Anhand von Auszügen dieser Zeitspanne verfolgt die Analyse, wie sich die Teilnehmer zu dieser Zeitlichkeit des Wartens und (Noch-nicht-)Beginnens orientieren und wie sie den Anfang des Essens gemeinsam konstruieren.
On the basis of a single case analysis of the emergence of an ethnic joke, this paper explores issues related to laughter in international business meetings. More particularly, it deals with ways in which a person's name is correctly pronounced. Speakers and co-participants seem to orient towards ‘proper’ ways of vocalizing names and to consequent ‘variations’ or ‘deviations’ from them, making different ways of pronunciation available as a laughable. In making such pronunciation variations available, accountable and recognizable, participants reflexively establish as relevant the multilingual character of the activity, of the participants’ competences and of the setting; conversely, they exploit these multilingual features within specific social practices, leading to laughter.
Our analysis focuses on the contexts of action, the sequential environments and the interactional practices by which the uttering of a name becomes a ‘laughable’ and then a resource for an ethnic joke. Moreover, it explores the implications of transforming the pronunciation into a laughable in terms of the organization of the ongoing activity, changing participation frameworks and membership categorizations. In this sense, it highlights the flexible structure of groups and the way in which laughter reconfigures them through local affiliating and disaffiliating moves, and by making various national categories available and relevant.
The chapter on formats and models for lexicons deals with different available data formats of lexical resources. It elaborates on their structure and possible uses. Motivated by the restrictions in merging different lexical resources based on widely spread formalisms and international standards, a formal lexicon model for lexical resources is developed which is related to graph structures in annotations. For lexicons this model is termed the Lexicon Graph. Within this model the concepts of lexicon entries and lexical structures frequently described in the literature are formally defined and examples are given. The article addresses the problem of ambiguity in those formal terms. An implementation based on XML and XML technology such as XQuery for the defined structures is given. The relation to international standards is included as well.
This paper represents a report on an e-tandem project conducted at Freiburg University (Germany) from the winter term 2009/2010 on. It started with a German-ltalian pilot course organised in cooperation with Pavia University (Italy). In order to promote autonomous language learning, the authors used several web-based applications, relying on Skype to enable full (i.e. visual, auditive) interaction between learning partners and on e-mails to let participants practise writing and reading in the respective foreign language. Additionally, participants were asked to compile a weekly electronic portfolio (EPOS) to record their improvements as well as their difficulties. In the paper, the structure of the pilot course will be described and a first balance will be drawn.
Die Beiträge dieses Heftes gehen zurück auf einen Workshop des Arbeitskreises Hyper-media der Gesellschaft für Computerlinguistik und Sprachtechnologie (GSCL). Der Workshop fand im Rahmen der GSCL-Tagung 2009 in Potsdam statt und sollte den aktuellen Stand der Überlegungen zur Nutzbarkeit hypermedialer Systeme in den E-Humanities beleuchten.
Seit Jakob Nielsen Mitte der Neunzigerjahre die Kriterien für anwenderfreundliche Hypermediasysteme – Easy to learn, efficient to use, easy to remember, few errors, pleasant to use – dargelegt hat, beschäftigt sich die Usability-Forschung mit empirisch verifizierbaren Beurteilungskriterien und Erhebungsmethoden. Ziel ist die Steigerung der Nutzungsqualität hypermedialer Angebote, häufig mit den Schwerpunkten Internet/WWW bzw. Web 2.0 sowie in letzter Zeit verstärkt unter Berücksichtigung multimodaler Schnittstellen.
Die in diesem Heft zusammengestellten Beiträge beleuchten eine Reihe sehr unter-schiedlicher Aspekte von Nutzungsqualität an konkreten Anwendungen und aus theo-retischer Perspektive.
Dieser Beitrag gibt einen Überblick über CoDII, die Collection of Distributionally Idiosyncratic Items. CoDII ist eine elektronische Sammlung verschiedener Untergruppen lexikalischer Elemente, die sich durch idiosynkratische Distribution auszeichnen. Das bedeutet, dass sich die Verteilung dieser Lexeme im Text nicht alleine aufgrund ihrer syntaktischen Kategorie Vorhersagen lässt. Die Methoden, die in der Entwicklung von CoDII angewandt werden, greifen über traditionelle Fachgrenzen hinaus und umfassen Korpuslinguistik, Computerlinguistik, Phraseologie und theoretische Sprachwissenschaft. Ein wichtiger Schwerpunkt unserer Diskussion liegt auf der Darstellung, inwiefern die in CoDII gesammelten, annotierten und unter anderem mit Suchwerkzeugen abfragbaren Daten dazu beitragen können, die linguistische Theoriebildung durch die Bereitstellung sorgfältig aufbereiteter Datensammlungen bei der Überprüfung ihrer Datengrundlage zu unterstützen.
A central question in psycholinguistics is how the human brain processes language in real time. To answer this question, the differences between auditory and visual processing have to be considered. The present dissertation examines the extent to which event-related potentials (ERPs) in the human electroencephalogram (EEG) interact with different modes of presentation during sentence comprehension. Besides the two classical modalities, auditory and rapid serial visual presentation (RSVP), the monitoring of readers’ eye movements was chosen as a new mode of presentation. Here, the temporal paradox between neuronal ERP effects and behavioral effects in the eye movement record were of particular interest. Specifically, by concurrently measuring ERPs and eye movements in natural reading, the dissertation aimed to shed light on the counterintuitive fact that difficulties in sentence comprehension arise earlier in eye movement measures than in the corresponding neuronal ERP effects. In contrast to RSVP and the auditory modality, reading offers a parafoveal preview of upcoming words (Rayner 1998), which enables the brain to process information of words before these are fixated for the first time (in foveal vision). When the word Gegenteil in example (1) below is fixated and processed, the brain concurrently processes some information of the upcoming parafoveal words von and weiß. (1) Schwarz ist das Gegenteil von weiß. (2) Schwarz […] blau. (3) Schwarz […] nett. The parafoveal preview mostly provides orthographic (word form) information, while semantic information is not conveyed (Inhoff & Starr 2004; White 2008). Whereas word form and lexical meaning are processed simultaneously with RSVP and auditory presentation, the parafoveal preview in natural reading allows for a temporal decoupling such that word forms are processed before meaning. This is one reason for the faster information uptake in reading. The present dissertation is the first to systematically investigate the influence of the parafoveal preview in sentence processing. Participants read sentences such as in (1)-(3), in which two adjectives were either antonyms (1), semantically related non-antonyms (2), or semantically unrelated non-antonyms (3). ERPs were computed for the last fixation before the target word (the sentence-final word in 1-3), which was assumed to capture parafoveal processing, and for the first fixation on the target, that should reflect foveal processing. The results were compared to two experiments using identical stimuli with auditory and RSVP presentation, and the parafoveal preview clearly led to different ERP results. While the RSVP and auditory presentations replicated the finding of a P300 to the second antonym in (1) (Kutas & Iragui 1998; Roehm et al. 2007), there was no P300 in response to antonyms at any fixation position in natural reading. However, the dissociation of parafoveal and foveal processing in reading also made it possible to disentangle different processes underlying the N400. There was a reduced parafoveal N400 for (1,2) compared with (3), which could be attributed to the preactivation of the word forms of the expected antonyms and of semantically related non-antonyms. In foveal vision, all non-antonyms (2,3) showed an enhanced N400 compared with (1) because they were unexpected and implausible in the sentence context. This dissociation between the preactivation of a word-form and the contextual fit of a word’s meaning is impossible with the other two modes of presentation, because orthographic and semantic information become available almost at the same time and are thus processed simultaneously. Furthermore, the parafoveal N400 effect was not accompanied by changes in the duration of the corresponding fixation, whereas the foveal N400 was. Similarly, with the concurrent measurement of ERPs and eye movements, the temporal paradox described above remained, as effects in the eye movement record preceded the neuronal ERP effects. Further support for these central findings came from two additional experiments that investigated different stimuli with concurrent ERP-eye tracking measures. Altogether, the experiments revealed that the previous findings on the language-related N400 can be replicated with natural reading, but they can also be differentiated qualitatively by virtue of the characteristics of natural reading. Although the behavioral and neuronal effects mirrored one another, not every neuronal effect necessarily translates into a behavioral output. Finally, even concurrent ERP-eye tracking measures cannot resolve the temporal paradox.
Authors like Fillmore 1986 and Goldberg 2006 have made a strong case for regarding argument omission in English as a lexical and construction-based affordance rather than one based on general semantico-pragmatic constraints. They do not, however, address the question of how grammatical restrictions on null complementation might interact with broader narrative conventions, in particular those of genre. In this paper, we attempt to remedy this oversight by presenting a comprehensive overview of genre-based argument omissions and offering a construction-based analysis of genre-based omission conventions. We consider five genre-based omission types: instructional imperatives (Culy 1996, Bender 1999), labelese, diary style (Haegeman 1990), match reports (Ruppenhofer 2004) and quotative clauses. We show that these omission types share important traits; all, for example, have anaphoric rather than indefinite construals. We also show, however, that the omission types differ from each other in idiosyncratic ways. We then address several interrelated representational problems posed by the grammatical treatment of genre-based omissions. For example, the constructions that represent genre-based omission conventions must interact with the lexical entries of verbs, many of which do not generally permit omitted arguments. Accordingly, we offer constructional analyses of genre-based omissions that allow constructions to override lexical valence constraints.