Refine
Year of publication
Document Type
- Conference Proceeding (15)
- Part of a Book (5)
- Article (2)
- Doctoral Thesis (1)
Has Fulltext
- yes (23)
Keywords
- Syntaktische Analyse (23) (remove)
Publicationstate
- Veröffentlichungsversion (16)
- Zweitveröffentlichung (3)
- Postprint (1)
Reviewstate
Publisher
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
To improve grammatical function labelling for German, we augment the labelling component of a neural dependency parser with a decision history. We present different ways to encode the history, using different LSTM architectures, and show that our models yield significant improvements, resulting in a LAS for German that is close to the best result from the SPMRL 2014 shared task (without the reranker).
In a number of languages, agreement in specificational copular sentences can or must be with the second of the two nominals, even when it is the first that occupies the canonical subject position. Béjar & Kahnemuyipour (2017) show that Persian and Eastern Armenian are two such languages. They then argue that ‘NP2 agreement’ occurs because the nominal in subject position (NP1) is not accessible to an external probe. It follows that actual agreement with NP1 should never be possible: the alternative to NP2 agreement should be ‘default’ agreement. We show that this prediction is false. In addition to showing that English has NP1, not default, agreement, we present new data from Icelandic, a language with rich agreement morphology, including cases that involve ‘plurale tantum’ nominals as NP1. These allow us to control for any confound from the fact that typically in a specificational sentence with two nominals differing in number, it is NP2 that is plural. We show that even in this case, the alternative to agreement with NP2 is agreement with NP1, not a default. Hence, we conclude that whatever the correct analysis of specificational sentences turns out to be, it must not predict obligatory failure of NP1 agreement.
Im Teilprojekt CI “SemDok” der DFG-Forschergruppe Texttechnologische Informationsmodellierung wurde ein Textparser für Diskursstrukturen wissenschaftlicher Zeitschriftenartikel nach der Rhetorical Structure Theory entwickelt. Die wesentlichen konzeptuellen und technischen Merkmale des Chart-Parsers und die sich daraus ergebenden Parametrisierungsmöglichkeiten für Parsing-Experimente werden beschrieben. Zudem wird HPVtz., ein Tool für die Visualisierung von Parsing-Ergebnissen (RST-Bäume in einer XML-Anwendung) und die Navigation in ihnen, vorgestellt.
How to Compare Treebanks
(2008)
Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.
TePaCoC - A Testsuite for Testing Parser Performance on Complex German Grammatical Constructions
(2009)
We present the IUCL system, based on supervised learning, for the shared task on stance detection. Our official submission, the random forest model, reaches a score of 63.60, and is ranked 6th out of 19 teams. We also use gradient boosting decision trees and SVM and merge all classifiers into an ensemble method. Our analysis shows that random forest is good at retrieving minority classes and gradient boosting majority classes. The strengths of different classifiers wrt. precision and recall complement each other in the ensemble.
Syntax und Morphologie
(1997)