Refine
Year of publication
- 2004 (20) (remove)
Document Type
- Conference Proceeding (20) (remove)
Has Fulltext
- yes (20)
Is part of the Bibliography
- no (20)
Keywords
- Korpus <Linguistik> (6)
- Annotation (5)
- Deutsch (4)
- Gesprochene Sprache (4)
- Computerlinguistik (3)
- Auszeichnungssprache (2)
- Automatische Sprachanalyse (2)
- Automatische Spracherkennung (2)
- Englisch (2)
- Portugiesisch (2)
Publicationstate
- Veröffentlichungsversion (9)
- Zweitveröffentlichung (2)
- Postprint (1)
Reviewstate
- (Verlags)-Lektorat (8)
- Peer-Review (1)
- Review-Status-unbekannt (1)
Publisher
- European Language Resources Association (ELRA) (3)
- ELRA (2)
- CSLI Publications (1)
- EFNIL (1)
- Edusp/Monferrer Produções (1)
- Euralex (1)
- European Language Resources Association (1)
- Extreme Markup Languages Conference (1)
- Institute for Logic, Language and Computation (1)
- The Association for Computational Linguistics (1)
Reframing FrameNet Data
(2004)
The Berkeley FrameNet Project (http://www.icsi.berkeley.edu/~framenet) is building an on-line lexical resource for contemporary English. The database provides information about the semantic and syntactic combinatorial possibilities (valences) of each item analyzed. This paper describes the conceptual basis for what has been called reframing of data in the FrameNet database and exemplifies two new frame-to-frame relations, Causative_of and Inchoative_of, the implementation of which came about as a result of reanalysis of certain frames and lexical units. The new relations are characterized with respect to a triple of frames involving the notion of attaching, and entering them into the database is demonstrated using the Frame Relations Editor. The two relations allow FrameNet to make frame-wise distinctions that capture fairly systematic semantic relationships across sets of lexical units. While the Inheritance and Subframe relations are of particular interest to the NLP research community, Causative_of and Inchoative_of may be more relevant to lexicography.
The administration of electronic publication in the Information Era congregates old and new problems, especially those related with Information Retrieval and Automatic Knowledge Extraction. This article presents an Information Retrieval System that uses Natural Language Processing and Ontology to index collection’s texts. We describe a system that constructs a domain specific ontology, starting from the syntactic and semantic analyses of the texts that compose the collection. First the texts are tokenized, then a robust syntactic analysis is made, subsequently the semantic analysis is accomplished in conformity with a metalanguage of knowledge representation, based on a basic ontology composed of 47 classes. The ontology, automatically extracted, generates richer domain specific knowledge. It propitiates, through its semantic net, the right conditions for the user to find with larger efficiency and agility the terms adapted for the consultation to the texts. A prototype of this system was built and used for the indexation of a collection of 221 electronic texts of Information Science written in Portuguese from Brazil. Instead of being based in statistical theories, we propose a robust Information Retrieval System that uses cognitive theories, allowing a larger efficiency in the answer to the users queries.
The motivation for this article is to describe a methodology for interrelating and analyzing language and theory-specific corpus data from various languages. As an example phenomeon we use information structure (IS, see [3]) in treebanks from three languages: Spanish, Korean and Japanese. Korean and Japanese are typologically close, while both are typologically different from Spanish. Therefore, the problem of annotating IS is that there are diverging language-specific formal linguistic means for the realization of IS-functions (like “topicalization / contrast”) on various levels like prosody, morphology and word-order. Hence, it is necessary to describe the relations between language-specific formal means and functional views on IS, and how to operationalize these relations for corpus analysis.
We present the annotation of information structure in the MULI project. To learn more about the information structuring means in prosody, syntax and discourse, theory- independent features were defined for each level. We describe the features and illustrate them on an example sentence. To investigate the interplay of features, the representation has to allow for inspecting all three layers at the same time. This is realised by a stand-off XML mark-up with the word as the basic unit. The theory-neutral XML stand-off annotation allows integrating this resource with other linguistic resources such as the Tiger Treebank for German or the Penn treebank for English.
The goal of the MULI (MUltiLingual Information structure) project is to empirically analyse information structure in German and English newspaper texts. In contrast to other projects in which information structure is annotated and investigated (e.g. in the Prague Dependency Treebank, which mirrors the basic information about the topic-focus articulation of the sentence), we do not annotate theory-biased categories like topic-focus or theme-rheme. Trying to be as theory-independent as possible, we annotate those features which are relevant to information structure and on the basis of which typical patterns, co-occurrences or correlations can be determined. We distinguish between three annotation levels: syntax, discourse and prosody. The data is based on the TIGER Corpus for German and the Penn Treebank for English, since the existing information on part-of-speech and syntactic structure can be re-used for our purposes. The actual annotation of an English example sequence illustrates our choice of categories on each level. Their combination offers the possibility to investigate how information structure is realised and can be interpreted.
This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph.
The aim of this paper is to highlight the actual need for corpora that have been annotated based on acoustic information. The acoustic information should be coded in features or properties and is needed to inform further processing systems, i.e. to present a basis for a speech recognition system using linguistic information. Feature annotation of existing corpora in combination with segmental annotation can provide a powerful training material for speech recognition systems, but will as well challenge the further processing of features to segments and syllables. We present here the theoretical preliminaries for our multilingual feature extraction system, that we are currently working on.
This paper focuses on aspects of the licensing of adverbial noun phrases (AdvNPs) in the HPSG grammar framework. In the first part, empirical issues will be discussed. A number of AdvNPs will be examined with respect to various linguistic phenomena in order to find out to what extent AdvNPs share syntactic and semantic properties with non-adverbial NPs. Based on empirical generalizations, a lexical constraint for licensing both AdvNPs and non-adverbial NPs will be provided. Further on, problems of structural licensing of phrases containing AdvNPs that arise within the standard HPSG framework of Pollard and Sag (1994) will be pointed out, and a possible solution will be proposed. The objective is to provide a constraint-based treatment of NPs which describes non-redundantly both their adverbial and non-adverbial usages. The analysis proposed in this paper applies lexical and phrasal implicational constraints and does not require any radical modifications or extensions of the standard HPSG geometry of Pollard and Sag (1994).
Since adverbial NPs have particularly high frequency and a wide spectrum of uses in inflectional languages such as Polish, we will take Polish data into consideration.