Refine
Document Type
- Conference Proceeding (7)
- Part of a Book (2)
Has Fulltext
- yes (9)
Is part of the Bibliography
- no (9)
Keywords
- Korpus <Linguistik> (3)
- Annotation (2)
- Auszeichnungssprache (2)
- Co-Reference (1)
- Computerlinguistik (1)
- Coreference (1)
- Data Architecture (1)
- Data Formats (1)
- Einführung (1)
- Interoperabilität (1)
Publicationstate
- Veröffentlichungsversion (9) (remove)
Reviewstate
Publisher
Many XML-related activities (e.g. the creation of a new schema) already address issues with different languages, scripts, and cultures. Nevertheless, a need exists for additional mechanisms and guidelines for more effective internationalization (i18n) and localization (l10n) in XML-related contents and processes. The W3C Internationalization Tag Set Working Group (W3C ITS WG) addresses this need and works on data categories, representation mechanisms and guidelines related to i18n and l10n support in the XML realm. This paper describes initial findings from the (W3C ITS WG). Furthermore, the paper discusses how these findings relate to specific schema languages, and complementary technologies like namespace sectioning, schema annotation and the description of processing chains. The paper exemplifies why certain requirements only can be met by a combination of technologies, and discusses these technologies.
Co-reference annotation and resources: a multilingual corpus of typologically diverse languages
(2002)
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process.
Linguistische Korpora
(2004)
This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling of co-referential phenomena. Current corpus based approaches to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general models of co-reference miss input from dialogue data of non-European languages. We aim to fill this gap and contribute to a model of co-reference on various language-specific and language-general levels.
This paper proposes a methodology for querying linguistic data represented in different corpus formats. Examples of the need for queries over such heterogeneous resources are the corpus-based analysis of multimodal phenomena like the interaction of gestures and prosodic features, or syntax-related phenomena like information structure which exceed the expressive power of a tree-centered corpus format. Query languages (QLs) currently under development are strongly connected to corpus formats, like the NITE Object Model (NOM, Carletta et al., 2003) or the Meta-Annotation Infrastructure for ATLAS (MAIA, Laprun and Fiscus, 2002). The parallel development of linguistic query languages and corpus formats is due to the fact that general purpose query languages like XQuery (Boag et al., 2003) do not fulfill the changing needs of linguistically motivated queries, e.g. to give access to (non-)hierarchically organized, theory and language dependent annotations of multi modal signals and/or text. This leads to the problem that existing corpus formats and query languages are hard to reuse. They have to be re developed and re-implemented time-consumingly and expensively for unforeseen tasks. This paper describes an approach for overcoming these problems and a sample application.
This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats. Categories of a source format are transformed into a target format via a given set of general or language-specific mapping rules. The second relates treebanks via a transformation to a general model of linguistic categories, for example based on the EAGLES recommendations for syntactic annotations of corpora, or relying on the HPSG framework. This paper proposes a new methodology as a solution for these desiderata.