Refine
Year of publication
Document Type
- Conference Proceeding (15)
- Part of a Book (5)
- Article (2)
- Doctoral Thesis (1)
Has Fulltext
- yes (23)
Keywords
- Syntaktische Analyse (23) (remove)
Publicationstate
- Veröffentlichungsversion (16)
- Zweitveröffentlichung (3)
- Postprint (1)
Reviewstate
Publisher
Annotating Spoken Language
(2014)
TePaCoC - A Testsuite for Testing Parser Performance on Complex German Grammatical Constructions
(2009)
To improve grammatical function labelling for German, we augment the labelling component of a neural dependency parser with a decision history. We present different ways to encode the history, using different LSTM architectures, and show that our models yield significant improvements, resulting in a LAS for German that is close to the best result from the SPMRL 2014 shared task (without the reranker).
The annotation of parts of speech (POS) in linguistically annotated corpora is a fundamental annotation layer which provides the basis for further syntactic analyses, and many NLP tools rely on POS information as input. However, most POS annotation schemes have been developed with written (newspaper) text in mind and thus do not carry over well to text from other domains and genres. Recent discussions have concentrated on the shortcomings of present POS annotation schemes with regard to their applicability to data from domains other than newspaper text.
In a number of languages, agreement in specificational copular sentences can or must be with the second of the two nominals, even when it is the first that occupies the canonical subject position. Béjar & Kahnemuyipour (2017) show that Persian and Eastern Armenian are two such languages. They then argue that ‘NP2 agreement’ occurs because the nominal in subject position (NP1) is not accessible to an external probe. It follows that actual agreement with NP1 should never be possible: the alternative to NP2 agreement should be ‘default’ agreement. We show that this prediction is false. In addition to showing that English has NP1, not default, agreement, we present new data from Icelandic, a language with rich agreement morphology, including cases that involve ‘plurale tantum’ nominals as NP1. These allow us to control for any confound from the fact that typically in a specificational sentence with two nominals differing in number, it is NP2 that is plural. We show that even in this case, the alternative to agreement with NP2 is agreement with NP1, not a default. Hence, we conclude that whatever the correct analysis of specificational sentences turns out to be, it must not predict obligatory failure of NP1 agreement.
Syntax und Morphologie
(1997)
In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of domain and genre properties, and it is by no means clear what impact each of those has on statistical parsing. In this paper, we investigate how differences between articles in a newspaper corpus relate to the concepts of genre and domain and how they influence parsing performance of a transition-based dependency parser. We do this by applying various similarity measures for data point selection and testing their adequacy for creating genre-aware parsing models.
We present a method and a software tool, the FrameNet Transformer, for deriving customized versions of the FrameNet database based on frame and frame element relations. The FrameNet Transformer allows users to iteratively coarsen the FrameNet sense inventory in two ways. First, the tool can merge entire frames that are related by user-specified relations. Second, it can merge word senses that belong to frames related by specified relations. Both methods can be interleaved. The Transformer automatically outputs format-compliant FrameNet versions, including modified corpus annotation files that can be used for automatic processing. The customized FrameNet versions can be used to determine which granularity is suitable for particular applications. In our evaluation of the tool, we show that our method increases accuracy of statistical semantic parsers by reducing the number of word-senses (frames) per lemma, and increasing the number of annotated sentences per lexical unit and frame. We further show in an experiment on the FATE corpus that by coarsening FrameNet we do not incur a significant loss of information that is relevant to the Recognizing Textual Entailment task.