410 Linguistik
Refine
Year of publication
Document Type
- Part of a Book (423)
- Article (167)
- Conference Proceeding (127)
- Book (33)
- Working Paper (14)
- Other (8)
- Doctoral Thesis (2)
- Habilitation (2)
- Master's Thesis (1)
- Preprint (1)
Keywords
- Deutsch (262)
- Korpus <Linguistik> (116)
- Konversationsanalyse (76)
- Kommunikation (47)
- Computerlinguistik (44)
- Gesprochene Sprache (39)
- Computerunterstützte Lexikographie (36)
- Annotation (35)
- Automatische Sprachanalyse (35)
- Interaktion (27)
Publicationstate
- Veröffentlichungsversion (605)
- Postprint (42)
- Zweitveröffentlichung (6)
- Preprint (3)
- (Verlags)-Lektorat (1)
Reviewstate
- (Verlags)-Lektorat (514)
- Peer-Review (82)
- Verlags-Lektorat (16)
- Peer-review (13)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (4)
- Review-Status-unbekannt (4)
- Peer-Revied (3)
- (Verlags-)Lektorat (2)
- (Verlags-) Lektorat (1)
- Abschlussarbeit (Bachelor, Master, Diplom, Magister) (Bachelor, Master, Diss.) (1)
Publisher
- Narr (92)
- de Gruyter (87)
- Institut für Deutsche Sprache (29)
- Lang (29)
- Niemeyer (26)
- Schmidt (26)
- De Gruyter (23)
- European Language Resources Association (ELRA) (20)
- Verlag für Gesprächsforschung (15)
- Benjamins (14)
Semantic role labeling is traditionally viewed as a sentence-level task concerned with identifying semantic arguments that are overtly realized in a fairly local context (i.e., a clause or sentence). However, this local view potentially misses important information that can only be recovered if local argument structures are linked across sentence boundaries. One important link concerns semantic arguments that remain locally unrealized (null instantiations) but can be inferred from the context. In this paper, we report on the SemEval 2010 Task-10 on ‘‘Linking Events and Their Participants in Discourse’’, that addressed this problem. We discuss the corpus that was created for this task, which contains annotations on multiple levels: predicate argument structure (FrameNet and PropBank), null instantiations, and coreference. We also provide an analysis of the task and its difficulties.
Dieser Beitrag gibt einen Überblick über CoDII, die Collection of Distributionally Idiosyncratic Items. CoDII ist eine elektronische Sammlung verschiedener Untergruppen lexikalischer Elemente, die sich durch idiosynkratische Distribution auszeichnen. Das bedeutet, dass sich die Verteilung dieser Lexeme im Text nicht alleine aufgrund ihrer syntaktischen Kategorie Vorhersagen lässt. Die Methoden, die in der Entwicklung von CoDII angewandt werden, greifen über traditionelle Fachgrenzen hinaus und umfassen Korpuslinguistik, Computerlinguistik, Phraseologie und theoretische Sprachwissenschaft. Ein wichtiger Schwerpunkt unserer Diskussion liegt auf der Darstellung, inwiefern die in CoDII gesammelten, annotierten und unter anderem mit Suchwerkzeugen abfragbaren Daten dazu beitragen können, die linguistische Theoriebildung durch die Bereitstellung sorgfältig aufbereiteter Datensammlungen bei der Überprüfung ihrer Datengrundlage zu unterstützen.
Authors like Fillmore 1986 and Goldberg 2006 have made a strong case for regarding argument omission in English as a lexical and construction-based affordance rather than one based on general semantico-pragmatic constraints. They do not, however, address the question of how grammatical restrictions on null complementation might interact with broader narrative conventions, in particular those of genre. In this paper, we attempt to remedy this oversight by presenting a comprehensive overview of genre-based argument omissions and offering a construction-based analysis of genre-based omission conventions. We consider five genre-based omission types: instructional imperatives (Culy 1996, Bender 1999), labelese, diary style (Haegeman 1990), match reports (Ruppenhofer 2004) and quotative clauses. We show that these omission types share important traits; all, for example, have anaphoric rather than indefinite construals. We also show, however, that the omission types differ from each other in idiosyncratic ways. We then address several interrelated representational problems posed by the grammatical treatment of genre-based omissions. For example, the constructions that represent genre-based omission conventions must interact with the lexical entries of verbs, many of which do not generally permit omitted arguments. Accordingly, we offer constructional analyses of genre-based omissions that allow constructions to override lexical valence constraints.
Preface
(2010)
This paper shows that the phenomenon of plesionymy deserves greater attention and needs to be approached outside its traditional framework, which considered it to be a subtype of synonymy (Cruse, 1986, 2002; Croft and Cruse, 2004). This view suggested that pairs of terms such as foggy–misty, fearless–brave exhibit significant shared semantic traits that are more salient than their differences. Differing properties were considered to be subordinate. These are sometimes contextually foregrounded resulting in occasional oppositeness. Corpus studies show that this view is a broad generalization. This study sheds new light on German plesionyms by employing a corpus-linguistic approach. In particular, terms designating gradable properties (e.g. kritisch–ernst ‘critical–serious’, sauber–rein ‘clean–unsoiled/immaculate’) at neighboring positions of gradable scales show variable behavior and do not show a stronger affinity for synonymy. The position taken is that a relation of synonymy and contrast are equally a matter of construal. Both types of semantic relations are part of the conceptual and lexical knowledge and subject to a cognitive principle. This work also examines how plesionym relations are realized in discourse. This article demonstrates that plesionyms are co-occurrences within typical lexico-syntactic sequences. Following Jones’ (2002) and Murphy’s (2006) observations, these patterns (e.g. nicht X, eher Y; mehr X als Y; etc.) have specific discourse functions and are evidence to account for a construction-based view.
Conventional descriptions of synonymous items often concentrate on common semantic traits and the degree of semantic overlap they exhibit. Their aim is to offer classifications of synonymy rather than elucidating ways of establishing contextual meaning equivalence and the cognitive prerequisites for this. Generally, they lack explanations as to how synonymy is construed in actual language use. This paper investigates principles and cognitive devices of synonymy construction as they appear in corpus data, and focuses on questions of how meaning equivalence might be conceptualised by speakers.
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.
This paper provides a general overview of the treatment of lexico-semantic relations in different fields of research including theoretical and application-oriented disciplines. At the same time, it sketches the development of the descriptions and explanations of sense relations in various approaches as well as some methodologies which have been used to retrieve and analyse paradigmatic patterns.
Die Ordnung des öffentlichen Diskurses der Wirtschaftskrise und die (Un-)Ordnung des Ausgeblendeten
(2011)
Perhaps the biggest challenge in derivational morphology is to reconcile morphological idiosyncrasy with semantic regularity. How can it be explained that words with dead affixes and irregulär allomorphy can nonetheless exhibit straightforward and stable semantic relations to their etymological bases (cf. strength ‘property of being strong’, obedience ‘act of obeying’, ‘property of being obedient’)? Theories based on the idea of capturing regularity in terms of synthetic rules for building up complex words out of morphemes along with rules for interpreting such structures in a compositional fashion have not made - and arguably cannot make - sense of this phenomenon. Taking the perspective of the learner in acquisition, I propose an alternative approach to meaning assignment based, not on syntagmatic relations among their constituent morphemes, but on paradigmatic relations between whole words. This approach not only explains the conditions under which meaning relations between words are expected to be stable but also accounts for another notorious mystery in derivational morphology, the frequent occurrence of total synonymy among affixes, as opposed to words.
Introduction
(2010)
Reframing FrameNet Data
(2004)
The Berkeley FrameNet Project (http://www.icsi.berkeley.edu/~framenet) is building an on-line lexical resource for contemporary English. The database provides information about the semantic and syntactic combinatorial possibilities (valences) of each item analyzed. This paper describes the conceptual basis for what has been called reframing of data in the FrameNet database and exemplifies two new frame-to-frame relations, Causative_of and Inchoative_of, the implementation of which came about as a result of reanalysis of certain frames and lexical units. The new relations are characterized with respect to a triple of frames involving the notion of attaching, and entering them into the database is demonstrated using the Frame Relations Editor. The two relations allow FrameNet to make frame-wise distinctions that capture fairly systematic semantic relationships across sets of lexical units. While the Inheritance and Subframe relations are of particular interest to the NLP research community, Causative_of and Inchoative_of may be more relevant to lexicography.
We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user through the learning process. MaJo integrates off-the-shelf NLP tools like POS taggers, treebank-trained statistical parsers, as well as linguistic resources like WordNet and GermaNet. It enables the user to systematically explore the benefit gained from different feature types for WSD. In addition, MaJo provides an Active Learning environment, where the
system presents carefully selected instances to a human oracle. The toolkit supports manual annotation of the selected instances and re-trains the system on the extended data set. MaJo also provides the means to evaluate the performance of the system against a gold standard. We illustrate the usefulness of our system by learning the frames (word senses) for three verbs from the SALSA corpus, a version of the TiGer treebank with an additional layer of frame-semantic annotation. We show how MaJo can be used to tune the feature set for specific target words and so improve performance for these targets. We also show that syntactic features, when carefully tuned to the target word, can lead to a substantial increase in performance.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.
We introduce a system that learns the participants of arbitrary given scripts. This system processes data from web experiments, in which each participant can be realized with different expressions. It computes participants by encoding semantic similarity and global structural information into an Integer Linear Program. An evaluation against a gold standard shows that we significantly outperform two informed baselines.
Semantic argument structures are often incomplete in that core arguments are not locally instantiated. However, many of these implicit arguments can be linked to referents in the wider context. In this paper we explore a number of linguistically motivated strategies for identifying and resolving such null instantiations (NIs). We show that a more sophisticated model for identifying definite NIs can lead to noticeable performance gains over the state-of-the- art for NI resolution.
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
Consistency of reference structures is an important issue in lexicography and dictionary research, especially with respect to information on sense-related items. In this paper, the systematic challenges of this area (e.g. ‘non-reversed reference’, bidirectional linking being realised as unidirectional structures) will be outlined, and the problems which can be caused by these challenges for both lexicographers and dictionary users will be discussed. The paper also discusses how text-technological Solutions may help to provide Support for the consistency of sense-related pairings during the process of compiling a dictionary.
Klatsch und Tratsch als lustvolles Gruppenerlebnis. Eine ethnographisch-soziolinguistische Studie
(2001)
In this paper, we analyze a dramatically aggravated conflict interaction taking place in the course of an association’s meeting in an urban community center. The interaction can be seen as the culmination point of a social conflict developing and increasing over a period of years. In this conflict, one of the crucial points of the sociocultural development in the city under study is to be seen in an exemplary way. Our analysis started with the question, why this conflict is unsolvable although the interest divergences of the opposing parties are not irreconcilable. Our analysis shows that the protagonists practice different communicative social styles. These stylistic differences however, are not the cause for misunderstandings, but the protagonists use stylistic differences and different cultural orientations as a resource for political action. Thereby a process of increasing hardening of perspective divergence emerges together with an interaction modality of drama and of the fundamental grounding of divergent views. Theoretically we are concerned with the explication of a sociolinguistic theory which includes as constitutive components the concepts of communicative social style, of perspectivation and of interaction modality. We want to show, that the analyzed type of sociocultural conflict can be explained by virtue of considering the interplay of features on these three levels.
Linguistische Analyse
(1982)
This dissertation investigates discourse-pragmatic differences between variably linked arguments appearing in alternating argument structure constructions in the sense of Goldberg (1995) and Kay (manuscript). The properties that are studied include givenness, pragmatic relation (topic/focus), salience of referents, animacy, and others. They derive from the literature on sentence-type constructions such as topicalization and from research on the referential properties of NP form types.
The research carried out here has multiple uses. At the most basic level, it serves as an empirical check on existing characterizations of the pragmatic properties of the relevant arguments that are the result of syntactic and semantic analysis based on introspection alone. For instance, for the epistemic raising alternation involving verbs like seem, the predicted topicality difference between the subjects of the raised and unraised constructions (Langacker 1995) could not be confirmed.
This dissertation also addresses the question what kinds of pragmatic factors, if any, are relevant to argument structure constructions. Based on the evidence of the dative alternation, it does not seem to be the case that the kind of pragmatic influences on argument structure constructions are different or limited compared to the ones found to be relevant to sentence-type constructions.
The kind of research undertaken here can also inform the syntactic and semantic analysis of constructions. In the case of the dative alternation, the discourse-pragmatic characteristics of the variably linked arguments provide evidence that Basilico’s (1998) analysis of the difference between the alternates in terms of VP-shells and a difference between thetic and categorical ‘inner’ predication, on the one hand does not account for all the data and on the other can be re-stated in pragmatic terms other than the thetic-categorical distinction.
In addition to studies of valence alternations, this dissertation also discusses various null instantiation phenomena, which provide further evidence for the need to specify discourse-pragmatic properties as part of argument structure constructions and lexical entries.
Finally, it is suggested that the use of randomly sampled corpus data and statistical modelling throughout this dissertation improves both empirical and analytical coverage.
E-VALBU: Advanced SQL/XML processing of dictionary data using an object-relational XML database
(2008)
Contemporary practical lexicography uses a wide range of advanced technological aids,most prominently database systems for the administration of dictionary content. Since XML has become a de facto standard for the coding of lexicographic articles, integrated markup functionality – such as query, update, or transformation of instances – is of particular importance. Even the multi-channel distribution of dictionary data benefits from powerful XML database services. Exemplified by E-VALBU, the most comprehensive electronic dictionary on German verb valency, we outline an integrated approach for advanced XML storing and processing within an object-relational database, and for a public retrieval frontend using Web Services and AJAX technology.
Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep analysis depends on a) enriching shallow representations with linguistically motivated, rich information, and b) focussing different branches of research and combining ressources to create synergies with related work in NLP. In the paper, we propose SentiFrameNet, an extension to FrameNet, as a novel representation for sentiment analysis that is tailored to these aims.
Deutsch-türkische Kontaktvarietäten. Am Beispiel der Sprache von deutsch-türkischen Jugendlichen
(2004)
Materialgrundlage dieses Beitrags ist ein Gespräch mit einer jungen Polin deutsch-polnischer Herkunft über ihre biographischen Erfahrungen in Polen. Diese Erfahrungen sind geprägt durch das Leiden unter einer Mehrkulturalität, bei der die beteiligten Kulturen eine durch Krieg, Vertreibung und Vernichtung bestimmte gemeinsame Geschichte und aufgrund der Verbrechen der NS-Zeit und der Verfolgung der Deutschen im Polen der Nachkriegszeit eine von Haß und Feindseligkeit geprägte Beziehung zueinander entwickelt haben. Bei der Darstellung ihrer biographischen Entwicklung zeigt die Informantin in exemplarischer Weise die Probleme auf, die mit der Ausbildung einer ethnisch-kulturellen Identität unter solchen Bedingungen verbunden sind und die eine eindeutige kulturelle Selbstdefinition verhindern.
Über ihre problembelastete Erfahrung und die daraus entwickelte ambivalente Haltung den Deutschen gegenüber spricht die Informantin über weite Strecken nicht direkt und explizit, sondern andeutungsweise und ‘verschleiernd’. Ziel der Analyse ist es, die komplexe Selbstverortung der Informantin zu rekonstruieren und die Formulierungsverfahren zu beschreiben, die sie verwendet, um einerseits die Bedeutung der ‘versteckten’ Hintergründe für ihre biographische Entwicklung plausibel zu machen und um andererseits beide Gesprächspartnerinnen vor einer „Face“-bedrohenden Aktivierung des problematischen interkulturellen Potentials zu schützen. Der Fall ist ein gutes Beispiel dafür, wie man über belastende Erfahrungen sprechen kann unter Gesprächsbedingungen, für die ein Aspekt dieser Erfahrungen konstitutiv ist.
We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.
We describe the SemEval-2010 shared task on “Linking Events and Their Participants in Discourse”. This task is an extension to the classical semantic role labeling task. While semantic role labeling is traditionally viewed as a sentence-internal task, local semantic argument structures clearly interact with each other in a larger context, e.g., by sharing references to specific discourse entities or events. In the shared task we looked at one particular aspect of cross-sentence links between argument structures, namely linking locally uninstantiated roles to their co-referents in the wider discourse context (if such co-referents exist). This task is potentially beneficial for a number of NLP applications, such as information extraction, question answering or text summarization.
Die Beziehung zwischen Eltern und Jugendlichen und das Argumentieren in konfliktären Interaktionen
(1993)
In Anlehnung an die Theorie der Individuation wird vermutet, dass das Gesprächsverhalten von Müttern und jugendlichen Töchtern in konfliktären Interaktionen durch Kontrolltendenzen auf Seiten der Mütter und Individualisierungstendenzen auf Seiten der Töchter determiniert wird. Als Datenbasis dienten 140 Konfliktgespräche zwischen 110 Müttern und ihren jugendlichen Töchtern, die in zwei Studien erhoben wurden. Die transkribierten Gespräche wurden nach einem Argumentations-Kategorien-System in Einheiten zerlegt und klassifiziert. Die Ergebnisse stehen in Einklang mit den entwicklungspsychologischen Annahmen über die Beziehung partnerbezogenen Intentionen von Müttern und jugendlichen Töchtern. Töchter reagierten häufiger auf Argumente ihrer Mütter und versuchten diese zu schwächen, auch referierten sie häufig auf die eigene Person, ihre Präferenzen und Abneigungen (Individualisierung). Mütter begründeten stärker ihre eigene Position als Töchter dies taten und lenkten das Gespräch durch verbale Initiativen und durch Bezugnahme auf die Person der Partnerin (Kontrolle).
In den letzten Jahren entwickelten sich in vielen europäischen Großstädten unter Jugendlichen der 2. und 3. Migrantengeneration ethnolektale Formen des Deutschen. Sie sind charakteristisch für multilinguale Kontexte, in denen Sprecher unterschiedlicher Herkunftssprachen die regionale Umgangssprache des Landes, in dem sie leben, als lingua franca benutzen. Die neuen Formen haben große Überschneidungsbereiche mit den regionalen Varietäten, unterscheiden sich aber prosodisch- phonetisch, lexikalisch und morphosyntaktisch. Meist werden sie nur in bestimmten Kontexten verwendet, und die Sprecher wechseln virtuos zwischen regionalen Varietäten, Herkunftsvarietäten, sprachlichen Mischungen und ethnolektalen Formen.
Auf der Basis von drei ethnografischen Fallstudien in Mannheim wird gezeigt, wie die von den Migrantenjugendlichen entwickelten ethnolektalen Formen aussehen und zu welchen Zwecken die Jugendlichen sie verwenden. Die Jugendlichen haben ein weites Sprachrepertoire, verfugen über ethnolektale sowie standardnahe Formen und nutzen die Differenz zwischen beiden als kommunikative Ressource.
In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb drohen (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.
Reformulating place
(2013)
This report examines what can be accomplished in conversation by reformulating a reference to a place using the practices of repair. It is based on an analysis of a collection of place references situated in second pair parts of adjacency pairs taken from a wide range of field recordings of talk-in-interaction. Not surprisingly, place references are sometimes reformulated so as to indicate a misspeaking or in pursuit of recipient recognition. At other times, however, we show that place references can be reformulated to more adequately implement the action of a turn in prosecuting the course of action of which it is a part. In these cases repairing a place reference can target a source of trouble associated with implementing the action of a turn at talk, and thus reformulating place can serve as a practical resource for accomplishing a range of interactional tasks. We conclude with a more complex case in which two reformulations are deployed in responding to a so-called ‘double-barrelled’ initiating action.