Refine
Year of publication
Document Type
- Conference Proceeding (54)
- Part of a Book (32)
- Article (5)
- Working Paper (4)
- Other (2)
- Doctoral Thesis (1)
- Report (1)
Keywords
- Korpus <Linguistik> (37)
- Annotation (17)
- Computerlinguistik (16)
- Digital Humanities (15)
- Auszeichnungssprache (12)
- Sprachdaten (10)
- XML (10)
- Forschungsdaten (8)
- Infrastruktur (8)
- Deutsch (6)
Publicationstate
- Veröffentlichungsversion (99) (remove)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (15)
- Extreme Markup Languages Conference (6)
- European language resources association (ELRA) (4)
- de Gruyter (4)
- University of Illinois (3)
- University of Oulu (3)
- Zenodo (3)
- Institut für Deutsche Sprache (2)
- Lang (2)
- Leibniz-Institut für Deutsche Sprache (IDS) (2)
Linguistic corpora have been annotated by means of SGML-based markup languages for almost 20 years. We can, very roughly, differentiate between three distinct evolutionary stages of markup technologies. (1)Originally, single SGML tree-based document instances were deemed sufficient for the representation of linguistic structures. (2) Linguists began to realize that alternatives and extensions to the traditional model are needed. Formalisms such as, for example, NITE were proposed: the NITE Object Model (NOM) consists of multi-rooted trees. (3) We are now on the threshold of the third evolutionary stage: even NITE's very flexible approach is not suited for all linguistic purposes. As some structures, such as these, cannot be modeled by multi-rooted trees, an even more flexible approach is needed in order to provide a generic annotation format that is able to represent genuinely arbitrary linguistic data structures.
Poster des Text+ Partners Leibniz-Institut für Deutsche Sprache Mannheim präsentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Das Poster wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. verfasst. NFDI wird von der Bundesrepublik Deutschland und den 16 Bundesländern finanziert, und das Konsortium Text+ wird gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 460033370. Die Autor:innen bedanken sich für die Förderung sowie Unterstützung. Ein Dank geht außerdem an alle Einrichtungen und Akteur:innen, die sich für den Verein und dessen Ziele engagieren.
Formalisierung von Kontext und sprachlichem Wissen mit Prioritisierter Circumscription (VM-Memo 55)
(1994)
On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees
(2007)
The Generalised Architecture for Sustainability (GENAU) provides a framework for the transformation of single-file, multi-layer annotations into multi-rooted trees. By employing constraints expressed in XCONCUR-CL, this procedure can be performed lossless, i.e., without losing information, especially with regard to the nesting of elements that belong to multiple annotation layers. This article describes how different types of linguistic corpora can be transformed using specialised tools, and how constraint rules can be applied to the resulting multi-rooted trees to add an additional level of validation.
This paper describes the effort of the Institut für Deutsche Sprache (IDS), the central research institution for the German language, connected with Information and Communications Technology (ICT). Use of ICT in a language research institute is twofold. On the one hand, ICT provides basic services for researches to accomplish their daily work. On the other hand, several national and international institutions have a strong interest in ICT. Therefore, ICT can also be seen as an amplifier for language research. The first part of this paper reports on the activates of the IDS in internal and external ICT-related projects and initiatives. The second part describes a general strategy towards an ICT strategy that could be useful both for the IDS and other national language institutes. We think such a general strategy is necessary to create a strong foundation not only for the ICT-related projects, but as a basis for a modem research institute.