Refine
Document Type
- Conference Proceeding (5)
- Part of a Book (1)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Keywords
- Annotation (2)
- Digitale Sprachressourcen (2)
- Korpus <Linguistik> (2)
- Computerlinguistik (1)
- Coreference (1)
- Higher Education (1)
- Integration (1)
- Interrelated document grammars (1)
- Konversationsanalyse (1)
- Linguistische Datenverarbeitung (1)
- Markup Languages & Programming (1)
- Methode (1)
- Multilingual corpus (1)
- Multimodalität (1)
- Multiple annotations (1)
- Studiengang (1)
- Text Technology (1)
- Texttechnologie (1)
- Treebank (1)
Publicationstate
- Veröffentlichungsversion (6) (remove)
Reviewstate
Publisher
Vorwort
(1995)
Gegenstand des Workshop-Beitrags ist die Verknüpfung heterogener linguistischer Ressourcen. Eine bedeutende Teilmenge von Ressourcen in der gegenwärtigen linguistischen Forschung und Anwendung besteht zum einen aus XML-annotierten Textdokumenten und zum anderen aus externen Ressourcen wie Grammatiken, Lexika oder Ontologien. Es wird eine Architektur vorgestellt, die eine Integration heterogener Ressourcen erlaubt, wobei die Methoden zur Integration unabhängig von der jeweiligen Anwendung sind und somit verschiedene Verknüpfungen ermöglichen. Eine exemplarische Anwendung der Methodologie ist die Analyse anaphorischer Beziehungen.
Co-reference annotation and resources: a multilingual corpus of typologically diverse languages
(2002)
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process.
This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats. Categories of a source format are transformed into a target format via a given set of general or language-specific mapping rules. The second relates treebanks via a transformation to a general model of linguistic categories, for example based on the EAGLES recommendations for syntactic annotations of corpora, or relying on the HPSG framework. This paper proposes a new methodology as a solution for these desiderata.
The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic issues, dealing with anaphora resolution as a case study.