OPUS 4 | Search

32 search hits

1 to 10

Sort by

A standards-related web-based information system (2012)

Stührenberg, Maik ; Schonefeld, Oliver ; Witt, Andreas

This late breaking proposal introduces the prototype of a web-based information system dealing with standards in the field of annotation.

Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora (2015)

Graën, Johannes ; Clematide, Simon

The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as partof- speech tagging, lemmatisation, chunking, and dependency parsing facilitate precise querying of linguistic properties and can be used to extend word alignment to sub-sentential groups. Such highly interconnected data is stored in a relational database to allow for efficient retrieval and linguistic data mining, which may include the statistics-based selection of good example sentences. The varying information needs of contrastive linguists require a flexible linguistic query language for ad hoc searches. Such queries in the format of generalised treebank query languages will be automatically translated into SQL queries.

Co-reference in Japanese Task-oriented Dialogues: A Contribution to the Development of Language-specific and Language-general Annotation Schemes and Resources (2004)

Sasaki, Felix ; Witt, Andreas

This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling of co-referential phenomena. Current corpus based approaches to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general models of co-reference miss input from dialogue data of non-European languages. We aim to fill this gap and contribute to a model of co-reference on various language-specific and language-general levels.

Corpus Masking: Legally Bypassing Licensing Restrictions for the Free Distribution of Text Collections (2007)

Rehm, Georg ; Witt, Andreas ; Zinsmeister, Heike ; Dellert, Johannes

Declarations of Relations, Differences and Transformations between Theory-specific Treebanks: A New Methodology (2003)

Sasaki, Felix ; Witt, Andreas ; Metzing, Dieter

This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats. Categories of a source format are transformed into a target format via a given set of general or language-specific mapping rules. The second relates treebanks via a transformation to a general model of linguistic categories, for example based on the EAGLES recommendations for syntactic annotations of corpora, or relying on the HPSG framework. This paper proposes a new methodology as a solution for these desiderata.

Different Views on Markup (2010)

Goecke, Daniela ; Lüngen, Harald ; Metzing, Dieter ; Stührenberg, Maik ; Witt, Andreas

In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.

Discovering Subtle Word Relations in Large German Corpora (2015)

Buschjäger, Sebastian ; Pfahler, Lukas ; Morik, Katharina

With an increasing amount of text data available it is possible to automatically extract a variety of information about language. One way to obtain knowledge about subtle relations and analogies between words is to observe words which are used in the same context. Recently, Mikolov et al. proposed a method to efficiently compute Euclidean word representations which seem to capture subtle relations and analogies between words in the English language. We demonstrate that this method also captures analogies in the German language. Furthermore, we show that we can transfer information extracted from large non-annotated corpora into small annotated corpora, which are then, in turn, used for training NLP systems.

DSSSL zur Verarbeitung linguistischer Korpora (1999)

Witt, Andreas

From <tiger2/> to ISOTiger – community driven developments for syntax annotation in SynAF (2014)

Bosch, Sonja ; Eckart, Kerstin ; Faaß, Gertrud ; Heid, Ulrich ; Lee, Kiyong ; Pareja-Lora, Antonio ; Pretorius, Laurette ; Romary, Laurent ; Witt, Andreas ; Zeldes, Amir ; Zipser, Florian

In 2010, ISO published a standard for syntactic annotation, ISO 24615:2010 (SynAF). Back then, the document specified a comprehensive reference model for the representation of syntactic annotations, but no accompanying XML serialisation. ISO’s subcommittee on language resource management (ISO TC 37/SC 4) is working on making the SynAF serialisation ISOTiger an additional part of the standard. This contribution addresses the current state of development of ISOTiger, along with a number of open issues on which we are seeking community feedback in order to ensure that ISOTiger becomes a useful extension to the SynAF reference model.

Guidance through the standards jungle for linguistic resources (2012)

Stührenberg, Maik ; Werthmann, Antonina ; Witt, Andreas

Research today is often performed in collaborated projects composed of project partners with different backgrounds and from different institutions and countries. Standards can be a crucial tool to help harmonizing these differences and to create sustainable resources. However, choosing a standard depends on having enough information to evaluate and compare different annotation and metadata formats. In this paper we present ongoing work on an interactive, collaborative website that collects information on standards in the ﬁeld of linguistics as a means to guide interested researchers.

1 to 10

Person(s)
Title
Subject
Abstract
Fulltext
Year(s)

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

32 search hits