OPUS 4 | Search

26 search hits

1 to 10

Sort by

Year
Year
Title
Title
Author
Author

Linguistically Annotated Corpora: Quality Assurance, Reusability and Sustainability (2008)

Zinsmeister, Heike ; Witt, Andreas ; Kübler, Sandra ; Hinrichs, Erhard

SusTEInability of linguistic resources through feature structures (2009)

Witt, Andreas ; Rehm, Georg ; Hinrichs, Erhard ; Lehmberg, Timm ; Stegmann, Jens

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.

Korpusunterstützte Entwicklung lexikalischer Wissensbasen (1993)

Storrer, Angelika ; Feldweg, Helmut ; Hinrichs, Erhard

Avoiding Data Graveyards : from Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources (2006)

Schmidt, Thomas ; Chiarcos, Christian ; Lehmberg, Timm ; Rehm, Georg ; Witt, Andreas ; Hinrichs, Erhard

This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.

Sustainability of Annotated Resources in Linguistics (2008)

Rehm, Georg ; Witt, Andreas ; Hinrichs, Erhard ; Reis, Marga

Digital Text Resources for the Humanities – Legal Issues (2007)

Rehm, Georg ; Witt, Andreas ; Hinrichs, Erhard ; Lehmberg, Timm ; Chiarcos, Christian ; Zimmermann, Felix ; Zinsmeister, Heike ; Dellert, Johannes

Sustainability of annotated resources in linguistics: A web-platform for exploring, querying, and distributing linguistic corpora and other resources (2009)

Rehm, Georg ; Schonefeld, Oliver ; Witt, Andreas ; Hinrichs, Erhard ; Reis, Marga

We report on finished work in a project that is concerned with providing methods, tools, best practice guidelines, and solutions for sustainable linguistic resources. The article discusses several general aspects of sustainability and introduces an approach to normalizing corpus data and metadata records. Moreover, the architecture of the sustainability platform implemented by the authors is described.

The IVK-Ler corpus of adolescent foreign-language learners of German (2023)

Pushkina, Alexandra ; Hinrichs, Erhard

This paper presents the IVK-Ler corpus, a longitudinal, annotated learner corpus of weekly writings produced by a group of 18 adolescents in a preparatory class. The corpus consists of 117 student texts collected between 2020 and 2021 and has a structure layered by student and text number. It includes metadata that enables researchers to analyze and track individual student progress in terms of syntactic competence and literacy. The annotation schema, manual and automatic annotation processes, and corpus representation are described in detail. The corpus currently includes target hypotheses and gold standard part-of-speech tags. Future work could include additional annotation layers for topological fields and dependency relations, as well as semantic and discourse annotations to make the corpus usable for tasks beyond syntactic evaluations.

Gute Forschungsdaten, bessere Forschung: wie Forschung durch Forschungsdatenmanagement unterstützt wird (2018)

Mache, Beata ; Trippel, Thorsten ; Effinger, Maria ; Gradl, Tobias ; Haaf, Susanne ; Hinrichs, Erhard ; Horstmann, Wolfram ; Müller, Lydia ; Schrade, Torsten ; Teich, Elke

In diesem Panel geht es um die Förderung der geisteswissenschaftlichen Forschung durch eine planvolle Erhebung, Archivierung, Veröffentlichung und die dadurch ermöglichte Nachnutzung von Forschungsdaten, die sowohl zur Qualitätssicherung in der Forschung beitragen als auch nicht zuletzt neue Fragestellungen erlauben. Aus unterschiedlichen Perspektiven soll in dem Panel beleuchtet werden, welchen Mehrwert das Datenmanagement für die Forschung in den digitalen Geisteswissenschaften hat, wie man diesen Mehrwert erreicht und auch die Veröffentlichung der Forschungsdaten als ein selbstverständliches Element der Dissemination der Forschungsergebnisse etabliert und wie man gleichzeitig den Aufwand für die Forschung abschätzen kann.

Language Resources, Taxonomies and Metadata (2009)

Lemnitzer, Lothar ; Hinrichs, Erhard ; Witt, Andreas

In this paper we present an approach to faceted search in large language resource repositories. This kind of search which enables users to browse through the repository by choosing their personal sequence of facets heavily relies on the availability of descriptive metadata for the objects in the repository. This approach therefore informs the collection of a minimal set of metatdata for language resources. The work described in this paper has been funded by the EC within the ESFRI infrastructure project CLARIN.

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

26 search hits