Refine
Year of publication
- 2010 (20) (remove)
Document Type
- Conference Proceeding (12)
- Article (8)
Has Fulltext
- yes (20)
Is part of the Bibliography
- no (20)
Keywords
- Deutsch (10)
- Computerlinguistik (4)
- Korpus <Linguistik> (4)
- Annotation (3)
- Automatische Sprachanalyse (3)
- Maschinelles Lernen (3)
- Natürliche Sprache (3)
- Frame-Semantik (2)
- Fremdsprache (2)
- Grammatik (2)
Publicationstate
- Veröffentlichungsversion (20) (remove)
Reviewstate
- Peer-Review (20) (remove)
Publisher
This paper presents a survey on the role of negation in sentiment analysis. Negation is a very common linguistic construction that affects polarity and, therefore, needs to be taken into consideration in sentiment analysis.
We will present various computational approaches modeling negation in sentiment analysis. We will, in particular, focus on aspects such as level of representation used for sentiment analysis, negation word detection and scope of negation. We will also discuss limits and challenges of negation modeling on that task.
Bestimmte adsubstantivisch verwendete Demonstrativa verfügen – über die deiktische und phorische hinaus – über eine so genannte anamnestische Gebrauchsweise. Diese Verwendung wird in der Literatur häufig vernachlässigt, obwohl sie nach mehreren Autoren (z.B. Diessel, Himmelmann) den Ausgangspunkt der Grammatikalisierung der Demonstrativa bildet. Im vorliegenden Aufsatz wird einerseits nachgeprüft, ob und inwieweit die in der einschlägigen Literatur beschriebenen allgemeinen Charakteristika der anamnestischen Demonstrativa für das Deutsche und das Ungarische zutreffen. Andererseits werden auch die Eigenschaften der indefiniten Gegenstücke der anamnestischen Demonstrativa in beiden Vergleichssprachen anhand von Korpusbeispielen untersucht. Zum Schluss wird auch auf die möglichen Grammatikalisierungswege der Demonstrativa eingegangen.
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.
Bootstrapping Supervised Machine-learning Polarity Classifiers with Rule-based Classification
(2010)
In this paper, we explore the effectiveness of bootstrapping supervised machine-learning polarity classifiers using the output of domain-independent rule-based classifiers. The benefit of this method is that no labeled training data are required. Still, this method allows to capture in-domain knowledge by training the supervised classifier on in-domain features, such as bag of words.
We investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. The former addresses the issue in how far relevant constructions for polarity classification, such as word sense disambiguation, negation modeling, or intensification, are important for this self-training approach. We not only compare how this method relates to conventional semi-supervised learning but also examine how it performs under more difficult settings in which classes are not balanced and mixed reviews are included in the dataset.
Active learning has been applied to different NLP tasks, with the aim of limiting the amount of time and cost for human annotation. Most studies on active learning have only simulated the annotation scenario, using prelabelled gold standard data. We present the first active learning experiment for Word Sense Disambiguation with human annotators in a realistic environment, using fine-grained sense distinctions, and investigate whether AL can reduce annotation cost and boost classifier performance when applied to a real-world task.
This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model. Using the lexicon extracted from the training corpus and lexicon from the available word list, we used two statistical taggers for comparison reasons. The best result achieved was 93.1% accuracy in a 10-fold cross validation on the training set. The winning tagger was thereafter applied to annotate a 570,247 token corpus.
Opinion holder extraction is one of the important subtasks in sentiment analysis. The effective detection of an opinion holder depends on the consideration of various cues on various levels of representation, though they are hard to formulate explicitly as features. In this work, we propose to use convolution kernels for that task which identify meaningful fragments of sequences or trees by themselves. We not only investigate how different levels of information can be effectively combined in different kernels but also examine how the scope of these kernels should be chosen. In general relation extraction, the two candidate entities thought to be involved in a relation are commonly chosen to be the boundaries of sequences and trees. The definition of boundaries in opinion holder extraction, however, is less straightforward since there might be several expressions beside the candidate opinion holder to be eligible for being a boundary.
Corpus-based identification and disambiguation of reading indicators for German nominalizations
(2010)
Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.
This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.
We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.
Grammars even trying to be as comprehensible as possible hardly avoid using technical terms unknown to novices. To overcome these inconveniencies, the grammatical information system grammis of the Institut für Deutsche Sprache incorporated a glossary specialized on terms used within the system. This glossary - actually named Grammatische Grundbegriffe (elementary terms of grammar) and tied by hyperlinks to technical terms in the core grammar' of grammis - offers short and simple explanations mainly by means of exemplification. The idea is to provide the users with provisional understanding to get along while following the main themes they are interested in. Explicitly, the glossary is not a stand-alone dictionary of grammatical terms, and it should not be regarded as one.
Der Beitrag gibt einen Überblick über die Entwicklung und die Aufgaben des Fachverbandes Deutsch als Fremdsprache (FaDaF) seit seiner Gründung 1989/90. Er zeigt dabei die Entwicklungslinien des Verbandes auf, der als Nachfolge-Organisation des Arbeitskreises Deutsch als Fremdsprache beim DAAD (AkDaF) dessen Aufgaben übernommen, fortgeführt und weiter entwickelt hat.
In diesem Beitrag wird eine neue, funktional motivierte Systematik für den adnominalen Genitiv und entsprechende von-Phrasen, die zusammenfassend als ‘possessive Attribute’ bezeichnet werden, entwickelt. Sie beruht auf Erkenntnissen aus der sprachtypologischen Forschung und dem Vergleich mit anderen, vor allem germanischen Sprachen. Der Beschreibungsrahmen für die NP mit der übergreifenden ‘funktionalen Domäne’ der Referenz und den zugehörigen Subdomänen wird vorgestellt. Possessive Attribute können als eine Ausdrucksform der Subdomäne Modifikation bestimmt werden. Es wird gezeigt, dass possessive Attribute verschiedene funktionale Typen der Modifikation realisieren können: referentiell-verankernde (der Hut meiner Schwester), qualitative (ein Autor deutscher Herkunft) und klassifikatorische (ein Mann der Tat). Auch randständige possessive Attribute wie der ‘Teilungsgenitiv’ (eine Tasse heißen Tees) und der Identitätsgenitiv (das Laster der Unbescheidenheit) werden berücksichtigt. Die neue Ordnung possessiver Attribute nach funktionalen Subdomänen ist der traditionellen Einteilung vorzuziehen, insofern als sie lediglich Grundunterscheidungen gemäß dem referenzsemantischen Status des Modifikators (begrifflich versus referentiell) und nach dem Beitrag des Modifikators zur Bedeutungskomposition der NP (verankernd versus qualitativ bzw. klassifikatorisch) berücksichtigt. Zudem ist sie durch Testverfahren wie den Pronominalisierungstest abgesichert.
We describe the SemEval-2010 shared task on “Linking Events and Their Participants in Discourse”. This task is an extension to the classical semantic role labeling task. While semantic role labeling is traditionally viewed as a sentence-internal task, local semantic argument structures clearly interact with each other in a larger context, e.g., by sharing references to specific discourse entities or events. In the shared task we looked at one particular aspect of cross-sentence links between argument structures, namely linking locally uninstantiated roles to their co-referents in the wider discourse context (if such co-referents exist). This task is potentially beneficial for a number of NLP applications, such as information extraction, question answering or text summarization.
Grammatiktheoretische Forschung, das hat die jüngste IDS-Jahrestagung wieder einmal plastisch vor Augen geführt, muss gedacht werden als zähes Ringen zweier grundsätzlich antagonistischer Prinzipien: Der reichhaltigen Fülle sprachlicher Okkurrenzen, deren gründlicher Auslotung ein beträchtlicher Teil der gegenwärtigen sprachtheoretisch und sprachtechnologisch ausgerichteten Anstrengung gewidmet ist, muss stets der Versuch gegenüberstehen, diese überbordende Varianz abstrahierend und generalisierend einzudämmen – ohne dabei die empirischen Befunde übermäßig und unzulässig zu nivellieren.
So far, comprehensive grammar descriptions of Northern Sotho have only been available in the form of prescriptive books aiming at teaching the language. This paper describes parts of the first morpho-syntactic description of Northern Sotho from a computational perspective (Faaß, 2010a). Such a description is necessary for implementing rule based, operational grammars. It is also essential for the annotation of training data to be utilised by statistical parsers. The work that we partially present here may hence provide a resource for computational processing of the language in order to proceed with producing linguistic representations beyond tagging, may it be chunking or parsing. The paper begins with describing significant Northern Sotho verbal morpho-syntactics (section 2). It is shown that the topology of the verb can be depicted as a slot system which may form the basis for computational processing (section 3). Note that the implementation of the described rules (section 4) and also coverage tests are ongoing processes upon that we will report in more detail at a later stage.
In the paper we investigate the impact of data size on a Word Sense Disambiguation task (WSD). We question the assumption that the knowledge acquisition bottleneck, which is known as one of the major challenges for WSD, can be solved by simply obtaining more and more training data. Our case study on 1,000 manually annotated instances of the German verb drohen (threaten) shows that the best performance is not obtained when training on the full data set, but by carefully selecting new training instances with regard to their informativeness for the learning process (Active Learning). We present a thorough evaluation of the impact of different sampling methods on the data sets and propose an improved method for uncertainty sampling which dynamically adapts the selection of new instances to the learning progress of the classifier, resulting in more robust results during the initial stages of learning. A qualitative error analysis identifies problems for automatic WSD and discusses the reasons for the great gap in performance between human annotators and our automatic WSD system.
Vorwort
(2010)
Der Beitrag führt in das Themenheft der Zeitschrift Deutsche Sprache 2/2010 ein. Dieses Themenheft versammelt vier Beiträge zu einem zentralen Thema der deutschen Grammatik und Textlinguistik: der Form und Funktion von Attributionsstrukturen in der Nominalphrase. Gemeinsam ist allen Beiträgen der kontrastive und/oder funktional-typologische Zugang zu diesem Thema; Unterschiede bestehen in Bezug auf die untersuchten Attributtypen (Adjektiv-, Genitiv-, Präpositional- und Partizipialattribute), den methodischen Zugriff auf die Daten, die theoretischen Fragestellungen sowie die jeweiligen Vergleichssprachen (Niederländisch, Dänisch, Norwegisch, Englisch). Alle Beiträge dokumentieren das in den letzten Jahren wieder erstarkte Interesse an sprachvergleichenden Untersuchungen, das sich auch in entsprechenden themenspezifischen Konferenzen und Forschungsprojekten im In- und Ausland niederschlägt.
Neben kurzen Bestandsaufnahmen vom Status der Prosodie in Grammatiken und in DaF-Didaktiken und -Lehrwerken wird Prosodie näher bestimmt und ihre wichtigsten Eigenschaften und Funktionen in Wort, Ausspruch und Gespräch beschrieben. Im Weiteren wird vor allem die bedeutungsgestaltende Funktion der Prosodie herausgearbeitet. Aus phonologischer Sicht sehen wir die Informationsstruktur als zentral für die Vermittlung der Prosodie an. Anhand von Akzentgruppe und Intonationsphrase wird ihre Rolle bei der rhythmischen Gliederung von Aussprüchen vorgestellt. Als weiteres Beispiel für die kommunikative Funktion von Prosodie wird ihre Rolle beim Ausdruck von Emotion behandelt.