410 Linguistik
Refine
Year of publication
Document Type
- Part of a Book (423)
- Article (167)
- Conference Proceeding (127)
- Book (33)
- Working Paper (14)
- Other (8)
- Doctoral Thesis (2)
- Habilitation (2)
- Master's Thesis (1)
- Preprint (1)
Keywords
- Deutsch (262)
- Korpus <Linguistik> (116)
- Konversationsanalyse (76)
- Kommunikation (47)
- Computerlinguistik (44)
- Gesprochene Sprache (39)
- Computerunterstützte Lexikographie (36)
- Annotation (35)
- Automatische Sprachanalyse (35)
- Interaktion (27)
Publicationstate
- Veröffentlichungsversion (605)
- Postprint (42)
- Zweitveröffentlichung (6)
- Preprint (3)
- (Verlags)-Lektorat (1)
Reviewstate
- (Verlags)-Lektorat (514)
- Peer-Review (82)
- Verlags-Lektorat (16)
- Peer-review (13)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (4)
- Review-Status-unbekannt (4)
- Peer-Revied (3)
- (Verlags-)Lektorat (2)
- (Verlags-) Lektorat (1)
- Abschlussarbeit (Bachelor, Master, Diplom, Magister) (Bachelor, Master, Diss.) (1)
Publisher
- Narr (92)
- de Gruyter (87)
- Institut für Deutsche Sprache (29)
- Lang (29)
- Niemeyer (26)
- Schmidt (26)
- De Gruyter (23)
- European Language Resources Association (ELRA) (20)
- Verlag für Gesprächsforschung (15)
- Benjamins (14)
Semantic role labeling is traditionally viewed as a sentence-level task concerned with identifying semantic arguments that are overtly realized in a fairly local context (i.e., a clause or sentence). However, this local view potentially misses important information that can only be recovered if local argument structures are linked across sentence boundaries. One important link concerns semantic arguments that remain locally unrealized (null instantiations) but can be inferred from the context. In this paper, we report on the SemEval 2010 Task-10 on ‘‘Linking Events and Their Participants in Discourse’’, that addressed this problem. We discuss the corpus that was created for this task, which contains annotations on multiple levels: predicate argument structure (FrameNet and PropBank), null instantiations, and coreference. We also provide an analysis of the task and its difficulties.
Dieser Beitrag gibt einen Überblick über CoDII, die Collection of Distributionally Idiosyncratic Items. CoDII ist eine elektronische Sammlung verschiedener Untergruppen lexikalischer Elemente, die sich durch idiosynkratische Distribution auszeichnen. Das bedeutet, dass sich die Verteilung dieser Lexeme im Text nicht alleine aufgrund ihrer syntaktischen Kategorie Vorhersagen lässt. Die Methoden, die in der Entwicklung von CoDII angewandt werden, greifen über traditionelle Fachgrenzen hinaus und umfassen Korpuslinguistik, Computerlinguistik, Phraseologie und theoretische Sprachwissenschaft. Ein wichtiger Schwerpunkt unserer Diskussion liegt auf der Darstellung, inwiefern die in CoDII gesammelten, annotierten und unter anderem mit Suchwerkzeugen abfragbaren Daten dazu beitragen können, die linguistische Theoriebildung durch die Bereitstellung sorgfältig aufbereiteter Datensammlungen bei der Überprüfung ihrer Datengrundlage zu unterstützen.
Authors like Fillmore 1986 and Goldberg 2006 have made a strong case for regarding argument omission in English as a lexical and construction-based affordance rather than one based on general semantico-pragmatic constraints. They do not, however, address the question of how grammatical restrictions on null complementation might interact with broader narrative conventions, in particular those of genre. In this paper, we attempt to remedy this oversight by presenting a comprehensive overview of genre-based argument omissions and offering a construction-based analysis of genre-based omission conventions. We consider five genre-based omission types: instructional imperatives (Culy 1996, Bender 1999), labelese, diary style (Haegeman 1990), match reports (Ruppenhofer 2004) and quotative clauses. We show that these omission types share important traits; all, for example, have anaphoric rather than indefinite construals. We also show, however, that the omission types differ from each other in idiosyncratic ways. We then address several interrelated representational problems posed by the grammatical treatment of genre-based omissions. For example, the constructions that represent genre-based omission conventions must interact with the lexical entries of verbs, many of which do not generally permit omitted arguments. Accordingly, we offer constructional analyses of genre-based omissions that allow constructions to override lexical valence constraints.
Preface
(2010)
This paper shows that the phenomenon of plesionymy deserves greater attention and needs to be approached outside its traditional framework, which considered it to be a subtype of synonymy (Cruse, 1986, 2002; Croft and Cruse, 2004). This view suggested that pairs of terms such as foggy–misty, fearless–brave exhibit significant shared semantic traits that are more salient than their differences. Differing properties were considered to be subordinate. These are sometimes contextually foregrounded resulting in occasional oppositeness. Corpus studies show that this view is a broad generalization. This study sheds new light on German plesionyms by employing a corpus-linguistic approach. In particular, terms designating gradable properties (e.g. kritisch–ernst ‘critical–serious’, sauber–rein ‘clean–unsoiled/immaculate’) at neighboring positions of gradable scales show variable behavior and do not show a stronger affinity for synonymy. The position taken is that a relation of synonymy and contrast are equally a matter of construal. Both types of semantic relations are part of the conceptual and lexical knowledge and subject to a cognitive principle. This work also examines how plesionym relations are realized in discourse. This article demonstrates that plesionyms are co-occurrences within typical lexico-syntactic sequences. Following Jones’ (2002) and Murphy’s (2006) observations, these patterns (e.g. nicht X, eher Y; mehr X als Y; etc.) have specific discourse functions and are evidence to account for a construction-based view.
Conventional descriptions of synonymous items often concentrate on common semantic traits and the degree of semantic overlap they exhibit. Their aim is to offer classifications of synonymy rather than elucidating ways of establishing contextual meaning equivalence and the cognitive prerequisites for this. Generally, they lack explanations as to how synonymy is construed in actual language use. This paper investigates principles and cognitive devices of synonymy construction as they appear in corpus data, and focuses on questions of how meaning equivalence might be conceptualised by speakers.
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.
This paper provides a general overview of the treatment of lexico-semantic relations in different fields of research including theoretical and application-oriented disciplines. At the same time, it sketches the development of the descriptions and explanations of sense relations in various approaches as well as some methodologies which have been used to retrieve and analyse paradigmatic patterns.
Die Ordnung des öffentlichen Diskurses der Wirtschaftskrise und die (Un-)Ordnung des Ausgeblendeten
(2011)
Perhaps the biggest challenge in derivational morphology is to reconcile morphological idiosyncrasy with semantic regularity. How can it be explained that words with dead affixes and irregulär allomorphy can nonetheless exhibit straightforward and stable semantic relations to their etymological bases (cf. strength ‘property of being strong’, obedience ‘act of obeying’, ‘property of being obedient’)? Theories based on the idea of capturing regularity in terms of synthetic rules for building up complex words out of morphemes along with rules for interpreting such structures in a compositional fashion have not made - and arguably cannot make - sense of this phenomenon. Taking the perspective of the learner in acquisition, I propose an alternative approach to meaning assignment based, not on syntagmatic relations among their constituent morphemes, but on paradigmatic relations between whole words. This approach not only explains the conditions under which meaning relations between words are expected to be stable but also accounts for another notorious mystery in derivational morphology, the frequent occurrence of total synonymy among affixes, as opposed to words.
Introduction
(2010)
Reframing FrameNet Data
(2004)
The Berkeley FrameNet Project (http://www.icsi.berkeley.edu/~framenet) is building an on-line lexical resource for contemporary English. The database provides information about the semantic and syntactic combinatorial possibilities (valences) of each item analyzed. This paper describes the conceptual basis for what has been called reframing of data in the FrameNet database and exemplifies two new frame-to-frame relations, Causative_of and Inchoative_of, the implementation of which came about as a result of reanalysis of certain frames and lexical units. The new relations are characterized with respect to a triple of frames involving the notion of attaching, and entering them into the database is demonstrated using the Frame Relations Editor. The two relations allow FrameNet to make frame-wise distinctions that capture fairly systematic semantic relationships across sets of lexical units. While the Inheritance and Subframe relations are of particular interest to the NLP research community, Causative_of and Inchoative_of may be more relevant to lexicography.
We present MaJo, a toolkit for supervised Word Sense Disambiguation (WSD), with an interface for Active Learning. Our toolkit combines a flexible plugin architecture which can easily be extended, with a graphical user interface which guides the user through the learning process. MaJo integrates off-the-shelf NLP tools like POS taggers, treebank-trained statistical parsers, as well as linguistic resources like WordNet and GermaNet. It enables the user to systematically explore the benefit gained from different feature types for WSD. In addition, MaJo provides an Active Learning environment, where the
system presents carefully selected instances to a human oracle. The toolkit supports manual annotation of the selected instances and re-trains the system on the extended data set. MaJo also provides the means to evaluate the performance of the system against a gold standard. We illustrate the usefulness of our system by learning the frames (word senses) for three verbs from the SALSA corpus, a version of the TiGer treebank with an additional layer of frame-semantic annotation. We show how MaJo can be used to tune the feature set for specific target words and so improve performance for these targets. We also show that syntactic features, when carefully tuned to the target word, can lead to a substantial increase in performance.
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on aspects of objectivity, subjectivity and the overall polarity of the respective sentences. Layer 2 is concerned with polarity on the word- and phrase-level, annotating both subjective and factual language. The annotations on Layer 3 focus on the expression-level, denoting frames of private states such as objective and direct speech events. These three layers and their respective annotations are intended to be fully independent of each other. At the same time, exploring for and discovering interactions that may exist between different layers should also be possible. The reliability of the respective annotations was assessed using the average pairwise agreement and Fleiss’ multi-rater measures. We believe that MLSA is a beneficial resource for sentiment analysis research, algorithms and applications that focus on the German language.
We introduce a system that learns the participants of arbitrary given scripts. This system processes data from web experiments, in which each participant can be realized with different expressions. It computes participants by encoding semantic similarity and global structural information into an Integer Linear Program. An evaluation against a gold standard shows that we significantly outperform two informed baselines.
Semantic argument structures are often incomplete in that core arguments are not locally instantiated. However, many of these implicit arguments can be linked to referents in the wider context. In this paper we explore a number of linguistically motivated strategies for identifying and resolving such null instantiations (NIs). We show that a more sophisticated model for identifying definite NIs can lead to noticeable performance gains over the state-of-the- art for NI resolution.
This paper presents Release 2.0 of the SALSA corpus, a German resource for lexical semantics. The new corpus release provides new annotations for German nouns, complementing the existing annotations of German verbs in Release 1.0. The corpus now includes around 24,000 sentences with more than 36,000 annotated instances. It was designed with an eye towards NLP applications such as semantic role labeling but will also be a useful resource for linguistic studies in lexical semantics.
Consistency of reference structures is an important issue in lexicography and dictionary research, especially with respect to information on sense-related items. In this paper, the systematic challenges of this area (e.g. ‘non-reversed reference’, bidirectional linking being realised as unidirectional structures) will be outlined, and the problems which can be caused by these challenges for both lexicographers and dictionary users will be discussed. The paper also discusses how text-technological Solutions may help to provide Support for the consistency of sense-related pairings during the process of compiling a dictionary.
Klatsch und Tratsch als lustvolles Gruppenerlebnis. Eine ethnographisch-soziolinguistische Studie
(2001)
In this paper, we analyze a dramatically aggravated conflict interaction taking place in the course of an association’s meeting in an urban community center. The interaction can be seen as the culmination point of a social conflict developing and increasing over a period of years. In this conflict, one of the crucial points of the sociocultural development in the city under study is to be seen in an exemplary way. Our analysis started with the question, why this conflict is unsolvable although the interest divergences of the opposing parties are not irreconcilable. Our analysis shows that the protagonists practice different communicative social styles. These stylistic differences however, are not the cause for misunderstandings, but the protagonists use stylistic differences and different cultural orientations as a resource for political action. Thereby a process of increasing hardening of perspective divergence emerges together with an interaction modality of drama and of the fundamental grounding of divergent views. Theoretically we are concerned with the explication of a sociolinguistic theory which includes as constitutive components the concepts of communicative social style, of perspectivation and of interaction modality. We want to show, that the analyzed type of sociocultural conflict can be explained by virtue of considering the interplay of features on these three levels.
Linguistische Analyse
(1982)
This dissertation investigates discourse-pragmatic differences between variably linked arguments appearing in alternating argument structure constructions in the sense of Goldberg (1995) and Kay (manuscript). The properties that are studied include givenness, pragmatic relation (topic/focus), salience of referents, animacy, and others. They derive from the literature on sentence-type constructions such as topicalization and from research on the referential properties of NP form types.
The research carried out here has multiple uses. At the most basic level, it serves as an empirical check on existing characterizations of the pragmatic properties of the relevant arguments that are the result of syntactic and semantic analysis based on introspection alone. For instance, for the epistemic raising alternation involving verbs like seem, the predicted topicality difference between the subjects of the raised and unraised constructions (Langacker 1995) could not be confirmed.
This dissertation also addresses the question what kinds of pragmatic factors, if any, are relevant to argument structure constructions. Based on the evidence of the dative alternation, it does not seem to be the case that the kind of pragmatic influences on argument structure constructions are different or limited compared to the ones found to be relevant to sentence-type constructions.
The kind of research undertaken here can also inform the syntactic and semantic analysis of constructions. In the case of the dative alternation, the discourse-pragmatic characteristics of the variably linked arguments provide evidence that Basilico’s (1998) analysis of the difference between the alternates in terms of VP-shells and a difference between thetic and categorical ‘inner’ predication, on the one hand does not account for all the data and on the other can be re-stated in pragmatic terms other than the thetic-categorical distinction.
In addition to studies of valence alternations, this dissertation also discusses various null instantiation phenomena, which provide further evidence for the need to specify discourse-pragmatic properties as part of argument structure constructions and lexical entries.
Finally, it is suggested that the use of randomly sampled corpus data and statistical modelling throughout this dissertation improves both empirical and analytical coverage.
E-VALBU: Advanced SQL/XML processing of dictionary data using an object-relational XML database
(2008)
Contemporary practical lexicography uses a wide range of advanced technological aids,most prominently database systems for the administration of dictionary content. Since XML has become a de facto standard for the coding of lexicographic articles, integrated markup functionality – such as query, update, or transformation of instances – is of particular importance. Even the multi-channel distribution of dictionary data benefits from powerful XML database services. Exemplified by E-VALBU, the most comprehensive electronic dictionary on German verb valency, we outline an integrated approach for advanced XML storing and processing within an object-relational database, and for a public retrieval frontend using Web Services and AJAX technology.
Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep analysis depends on a) enriching shallow representations with linguistically motivated, rich information, and b) focussing different branches of research and combining ressources to create synergies with related work in NLP. In the paper, we propose SentiFrameNet, an extension to FrameNet, as a novel representation for sentiment analysis that is tailored to these aims.
Deutsch-türkische Kontaktvarietäten. Am Beispiel der Sprache von deutsch-türkischen Jugendlichen
(2004)
Materialgrundlage dieses Beitrags ist ein Gespräch mit einer jungen Polin deutsch-polnischer Herkunft über ihre biographischen Erfahrungen in Polen. Diese Erfahrungen sind geprägt durch das Leiden unter einer Mehrkulturalität, bei der die beteiligten Kulturen eine durch Krieg, Vertreibung und Vernichtung bestimmte gemeinsame Geschichte und aufgrund der Verbrechen der NS-Zeit und der Verfolgung der Deutschen im Polen der Nachkriegszeit eine von Haß und Feindseligkeit geprägte Beziehung zueinander entwickelt haben. Bei der Darstellung ihrer biographischen Entwicklung zeigt die Informantin in exemplarischer Weise die Probleme auf, die mit der Ausbildung einer ethnisch-kulturellen Identität unter solchen Bedingungen verbunden sind und die eine eindeutige kulturelle Selbstdefinition verhindern.
Über ihre problembelastete Erfahrung und die daraus entwickelte ambivalente Haltung den Deutschen gegenüber spricht die Informantin über weite Strecken nicht direkt und explizit, sondern andeutungsweise und ‘verschleiernd’. Ziel der Analyse ist es, die komplexe Selbstverortung der Informantin zu rekonstruieren und die Formulierungsverfahren zu beschreiben, die sie verwendet, um einerseits die Bedeutung der ‘versteckten’ Hintergründe für ihre biographische Entwicklung plausibel zu machen und um andererseits beide Gesprächspartnerinnen vor einer „Face“-bedrohenden Aktivierung des problematischen interkulturellen Potentials zu schützen. Der Fall ist ein gutes Beispiel dafür, wie man über belastende Erfahrungen sprechen kann unter Gesprächsbedingungen, für die ein Aspekt dieser Erfahrungen konstitutiv ist.
We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.
We describe the SemEval-2010 shared task on “Linking Events and Their Participants in Discourse”. This task is an extension to the classical semantic role labeling task. While semantic role labeling is traditionally viewed as a sentence-internal task, local semantic argument structures clearly interact with each other in a larger context, e.g., by sharing references to specific discourse entities or events. In the shared task we looked at one particular aspect of cross-sentence links between argument structures, namely linking locally uninstantiated roles to their co-referents in the wider discourse context (if such co-referents exist). This task is potentially beneficial for a number of NLP applications, such as information extraction, question answering or text summarization.
Die Beziehung zwischen Eltern und Jugendlichen und das Argumentieren in konfliktären Interaktionen
(1993)
In Anlehnung an die Theorie der Individuation wird vermutet, dass das Gesprächsverhalten von Müttern und jugendlichen Töchtern in konfliktären Interaktionen durch Kontrolltendenzen auf Seiten der Mütter und Individualisierungstendenzen auf Seiten der Töchter determiniert wird. Als Datenbasis dienten 140 Konfliktgespräche zwischen 110 Müttern und ihren jugendlichen Töchtern, die in zwei Studien erhoben wurden. Die transkribierten Gespräche wurden nach einem Argumentations-Kategorien-System in Einheiten zerlegt und klassifiziert. Die Ergebnisse stehen in Einklang mit den entwicklungspsychologischen Annahmen über die Beziehung partnerbezogenen Intentionen von Müttern und jugendlichen Töchtern. Töchter reagierten häufiger auf Argumente ihrer Mütter und versuchten diese zu schwächen, auch referierten sie häufig auf die eigene Person, ihre Präferenzen und Abneigungen (Individualisierung). Mütter begründeten stärker ihre eigene Position als Töchter dies taten und lenkten das Gespräch durch verbale Initiativen und durch Bezugnahme auf die Person der Partnerin (Kontrolle).
In den letzten Jahren entwickelten sich in vielen europäischen Großstädten unter Jugendlichen der 2. und 3. Migrantengeneration ethnolektale Formen des Deutschen. Sie sind charakteristisch für multilinguale Kontexte, in denen Sprecher unterschiedlicher Herkunftssprachen die regionale Umgangssprache des Landes, in dem sie leben, als lingua franca benutzen. Die neuen Formen haben große Überschneidungsbereiche mit den regionalen Varietäten, unterscheiden sich aber prosodisch- phonetisch, lexikalisch und morphosyntaktisch. Meist werden sie nur in bestimmten Kontexten verwendet, und die Sprecher wechseln virtuos zwischen regionalen Varietäten, Herkunftsvarietäten, sprachlichen Mischungen und ethnolektalen Formen.
Auf der Basis von drei ethnografischen Fallstudien in Mannheim wird gezeigt, wie die von den Migrantenjugendlichen entwickelten ethnolektalen Formen aussehen und zu welchen Zwecken die Jugendlichen sie verwenden. Die Jugendlichen haben ein weites Sprachrepertoire, verfugen über ethnolektale sowie standardnahe Formen und nutzen die Differenz zwischen beiden als kommunikative Ressource.
In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb drohen (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.
Reformulating place
(2013)
This report examines what can be accomplished in conversation by reformulating a reference to a place using the practices of repair. It is based on an analysis of a collection of place references situated in second pair parts of adjacency pairs taken from a wide range of field recordings of talk-in-interaction. Not surprisingly, place references are sometimes reformulated so as to indicate a misspeaking or in pursuit of recipient recognition. At other times, however, we show that place references can be reformulated to more adequately implement the action of a turn in prosecuting the course of action of which it is a part. In these cases repairing a place reference can target a source of trouble associated with implementing the action of a turn at talk, and thus reformulating place can serve as a practical resource for accomplishing a range of interactional tasks. We conclude with a more complex case in which two reformulations are deployed in responding to a so-called ‘double-barrelled’ initiating action.
Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised classification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard data, there is some doubt whether the approach can be successful when applied to noisy, real-world data sets. This paper presents a thorough evaluation of the impact of annotation noise on AL and shows that systematic noise resulting from biased coder decisions can seriously harm the AL process. We present a method to filter out inconsistent annotations during AL and show that this makes AL far more robust when applied to noisy data.
Eigenschaften von sozialen Stilen der Kommunikation: Am Beispiel einer türkischen Migrantinnengruppe
(2003)
Der Beitrag stellt die Konzeption einer sozialen Stilistik der Kommunikation vor, die sich insbesondere auf die Konzeption der kulturellen Stile in der Ethnographie der Kommunikation und auf neuere Entwicklungen in der interaktionalen Soziolinguistik bezieht. Wichtig für die Stil-Konzeption ist, dass Populationen soziale Stile in Reaktion auf relevante Probleme des sozialen Lebens entwickeln. Diese bestimmen die Kerne der Stilbildung, von denen aus fortschreitend Ausdrucksmaterial unterschiedlicher Art in die Stilgestalt inkorporiert wird. Die Konzeption der sozialen Stile wird anhand von Beobachtungen an einer Gruppe von deutsch-türkischen Migrantenjugendlichen in der Mannheimer Innenstadt, den „Powergirls“, demonstriert.
Preface
(2015)
Skizzierung des Vorhabens
(1982)
This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.
In this contribution, we report on an effort to annotate German data with information relevant to opinion inference. Such information has previously been referred to as effect or couched in terms of eventevaluation functors. We extend the theory and present an extensive scheme that combines both approaches and thus extends the set of inference-relevant predicates. Using these guidelines to annotate 726 German synsets, we achieve good inter-annotator agreement.
Taking a usage-based perspective, lexical-semantic relations and other aspects of lexical meaning are characterised as emerging from language use. At the same time, they shape language use and therefore become manifest in corpus data. This paper discusses how this mutual influence can be taken into account in the study of these relations. An empirically driven methodology is proposed that is, as an initial step, based on self-organising clustering of comprehensive collocation profiles. Several examples demonstrate how this methodology may guide linguists in explicating implicit knowledge of complex semantic structures. Although these example analyses are conducted for written German, the overall methodology is language-independent.
The constantly changing requirements of today’s media landscape demand a new concept for literary editions. Such a forward-looking model should be SGML/XML-based, and should acknowledge the central importance of topic maps. In this respect, the Thomas Mann project combines in a unique way the work of one of the most famous authors of the 20th century with an innovative way of information organization.
Personenregister
(2006)
Gespräche mit Patienten. Ein alltägliches und komplexes Arbeits- und Steuerungsinstrument für Ärzte
(2008)
We report an ethnographic and field-experiment-based study of time intervals in Amondawa, a Tupi language and culture of Amazonia. We analyse two Amondawa time interval systems based on natural environmental events (seasons and days), as well as the Amondawa system for categorising lifespan time (“age”). Amondawa time intervals are exclusively event-based, as opposed to time-based (i.e. they are based on event-duration, rather than measured abstract time units). Amondawa has no lexicalised abstract concept of time and no practices of time reckoning, as conventionally understood in the anthropological literature. Our findings indicate that not only are time interval systems and categories linguistically and culturally specific, but that they do not depend upon a universal “concept of time”. We conclude that the abstract conceptual domain of time is not a human cognitive universal, but a cultural historical construction, semiotically mediated by symbolic and cultural-cognitive artefacts for time reckoning.
We present two collections of lexical items with idiosyncratic distribution. The collections document the behavior of German and English bound words (BW, such as English “headway”), i.e., words which can only occur in one expression (“make headway”). BWs are a problem for both general and idiomatic dictionaries since it is unclear whether they have an independent lexical status and to what extent the expressions in which they occur are typical idiomatic expressions. We propose a system which allows us to document the information about BWs from dictionaries and linguistic literature, together with corpus data and example queries for major text corpora. We present our data structure and point to other phraseologically oriented collections. We will also show differences between the German and the English collection.
Scales and Scores. An evaluation of methods to determine the intensity of subjective expressions
(2015)
In this contribution, we present a survey of several methods that have been applied to the ordering of various types of subjective expressions (e.g. good < great), in particular adjectives and adverbs. Some of these methods use linguistic regularities that can be observed in large text corpora while others rely on external grounding in metadata, in particular the star ratings associated with product reviews. We discuss why these methods do not work uniformly across all types of expressions. We also present the first application of some of these methods to the intensity ordering of nouns (e.g. moron < dummy).
We examine predicative adjectives as an unsupervised criterion to extract subjective adjectives. We do not only compare this criterion with a weakly supervised extraction method but also with gradable adjectives, i.e. another highly subjective subset of adjectives that can be extracted in an unsupervised fashion. In order to prove the robustness of this extraction method, we will evaluate the extraction with the help of two different state-of-the-art sentiment lexicons (as a gold standard).
This work proposes opinion frames as a representation of discourse-level associations which arise from related opinion topics. We illustrate how opinion frames help gather more information and also assist disambiguation. Finally we present the results of our experiments to detect these associations.
The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; Kohl et al. 1998). In this paper, we compare the Levin classification with the work of the FrameNet project (Fillmore & Baker 2001), where words (not just verbs) are grouped according to the conceptual structures (frames) that underlie them and their combinatorial patterns are inductively derived from corpus evidence. This means that verbs grouped together in FrameNet (FN) might be semantically similar but have different (or no) alternations, and that verbs which share the same alternation might be represented in two different semantic frames.
In this contribution, we discuss and compare alternative options of modelling the entities and relations of wordnet-like resources in the Web Ontology Language OWL. Based on different modelling options, we developed three models of representing wordnets in OWL, i.e. the instance model, the dass model, and the metaclass model. These OWL models mainly differ with respect to the ontological Status of lexical units (word senses) and the synsets. While in the instance model lexical units and synsets are represented as individuals, in the dass model they are represented as classes; both model types can be encoded in the dialect OWL DL. As a third alternative, we developed a metaclass model in OWL FULL, in which lexical units and synsets are defined as metaclasses, the individuals of which are classes themselves. We apply the three OWL models to each of three wordnet-style resources: (1) a subset of the German wordnet GermaNet, (2) the wordnet-style domain ontology TermNet, and (3) GermaTermNet, in which TermNet technical terms and GermaNet synsets are connected by means of a set of “plug-in” relations. We report on the results of several experiments in which we evaluated the performance of querying and processing these different models: (1) A comparison of all three OWL models (dass, instance, and metaclass model) of TermNet in the context of automatic text-to-hypertext conversion, (2) an investigation of the potential of the GermaTermNet resource by the example of a wordnet-based semantic relatedness calculation.
This paper presents an annotation scheme for English modal verbs together with sense-annotated data from the news domain. We describe our annotation scheme and discuss problematic cases for modality annotation based on the inter-annotator agreement during the annotation. Furthermore, we present experiments on automatic sense tagging, showing that our annotations do provide a valuable training resource for NLP systems.
Introduction
(2012)
We present an approach for opinion role induction for verbal predicates. Our model rests on the assumption that opinion verbs can be divided into three different types where each type is associated with a characteristic mapping between semantic roles and opinion holders and targets. In several experiments, we demonstrate the relevance of those three categories for the task. We show that verbs can easily be categorized with semi-supervised graphbased clustering and some appropriate similarity metric. The seeds are obtained through linguistic diagnostics. We evaluate our approach against a new manually-compiled opinion role lexicon and perform in-context classification.
Recent work suggests that concreteness and imageability play an important role in the meanings of figurative expressions. We investigate this idea in several ways. First, we try to define more precisely the context within which a figurative expression may occur, by parsing a corpus annotated for metaphor. Next, we add both concreteness and imageability as “features” to the parsed metaphor corpus, by marking up words in this corpus using a psycholinguistic database of scores for concreteness and imageability. Finally, we carry out detailed statistical analyses of the augmented version of the original metaphor corpus, cross-matching the features of concreteness and imageability with others in the corpus such as parts of speech and dependency relations, in order to investigate in detail the use of such features in predicting whether a given expression is metaphorical or not.
Einführung in die Bände
(2002)
As many popular text genres such as blogs or news contain opinions by multiple sources and about multiple targets, finding the sources and targets of subjective expressions becomes an important sub-task for automatic opinion analysis systems. We argue that while automatic semantic role labeling systems (ASRL) have an important contribution to make, they cannot solve the problem for all cases. Based on the experience of manually annotating opinions, sources, and targets in various genres, we present linguistic phenomena that require knowledge beyond that of ASRL systems. In particular, we address issues relating to the attribution of opinions to sources; sources and targets that are realized as zero-forms; and inferred opinions. We also discuss in some depth that for arguing attitudes we need to be able to recover propositions and not only argued-about entities. A recurrent theme of the discussion is that close attention to specific discourse contexts is needed to identify sources and targets correctly.
Ziel des Beitrags ist es, zwei Frauengruppen aus einem innerstädtischen Gebiet Mannheims auf Gemeinsamkeiten und Unterschiede im Umgang mit territorialen Ansprüchen hin zu untersuchen und die Unterschiede inbezug auf die weite oder enge Definition von Territorien, die Art und Weise der Aushandlung territorialer Grenzen und der Durchsetzung territorialer Ansprüche als Merkmale des kommunikativen Stils der Gruppen zu beschreiben. Obwohl eine Reihe von Alltagsroutinen in beiden Gruppen auf den ersten Blick sehr ähnlich sind, unterschieden sich die Gruppen in der Definition ihrer Sozialbeziehungen erheblich.
Intensivinterview
(1982)
Vorbemerkung
(2004)
The main objective of this article is to describe the current activities at the Mannheim Institute for German Language regarding the implementation of a domain-specific ontology for German grammar. We differentiate ontology bases from ontology management Systems, point out the benefits of database-driven Solutions, and go Step by Step through all phases of the ontology lifecycle. In Order to demonstrate the practical use of our approach, we outline the interface between our ontology and the grammis web Information System, and compare the ontology-based retrieval mechanism with traditional full text search.
Die Verwendung von Formen der Mannheimer Stadtsprache in einer jugendlichen Migrantinnengruppe
(2002)
Verfahren der Perspektivenabschottung und ihre Auswirkungen auf die Dynamik des Argumentierens
(1996)
Im folgenden Beitrag wird eine besondere perspektivische Operation, die Perspektivenabschottung, behandelt, die in Gesprächen der Problem- und Konfliktbearbeitung eine zentrale Rolle spielt und zu erheblichen Interaktionsproblemen bis hin zum Abbruch der Interaktion führen kann. Nach einer begrifflichen und methodischen Klärung folgt die Beschreibung wesentlicher Verfahren perspektivischer Abschottung. Die Negativdynamik, die Verfahren perspektivischer Abschottung auslösen können, und die Anstrengung, die notwenig ist, um perspektivische Abschottung wieder aufzubrechen, wird in der Analyse einer kontroversen Diskussion aufgezeigt.