Refine
Year of publication
Document Type
- Part of a Book (15)
- Conference Proceeding (8)
- Article (2)
- Master's Thesis (1)
- Report (1)
- Working Paper (1)
Has Fulltext
- yes (28)
Keywords
- Computerlinguistik (28) (remove)
Publicationstate
- Veröffentlichungsversion (16)
- Zweitveröffentlichung (5)
- Postprint (3)
Reviewstate
Publisher
- Springer (3)
- de Gruyter (3)
- Institut für Deutsche Sprache (2)
- Libri Books on Demand (2)
- Schöningh (2)
- European Language Resources Association (ELRA) (1)
- Extreme Markup Languages Conference (1)
- Graphen & Netzwerke; AG des Verbandes Digital Humanities im deutschsprachigen Raum e.V. (1)
- Leibniz-Institut für Deutsche Sprache (IDS) (1)
- Mentis-Verlag (1)
XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.
Different Views on Markup
(2010)
In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.
Research today is often performed in collaborated projects composed of project partners with different backgrounds and from different institutions and countries. Standards can be a crucial tool to help harmonizing these differences and to create sustainable resources. However, choosing a standard depends on having enough information to evaluate and compare different annotation and metadata formats. In this paper we present ongoing work on an interactive, collaborative website that collects information on standards in the field of linguistics as a means to guide interested researchers.
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
Im Folgenden wird eine texttechnologische Komponente zur Expansion eines XML- annotierten Stammformenlexikons, das auf Einträgen eines Standardwörterbuchs basiert, vorgestellt. Diese Expansion wurde in der Document Style Semantics and Specification Language implementiert. Ihr Ergebnis ist ein Vollformenlexikon, das ebenfalls in XML repräsentiert ist.
Making CONCUR work
(2005)
The SGML feature CONCUR allowed for a document to be simultaneously marked up in multiple conflicting hierarchical tagsets but validated and interpreted in one tagset at a time. Alas, CONCUR was rarely implemented, and XML does not address the problem of conflicting hierarchies at all. The MuLaX document syntax is a non-XML syntax that enables multiply-encoded hierarchies by distinguishing different “layers” in the hierarchy by adding a layer ID as a prefix to the element names. The IDs tie all the elements in a single hierarchy together in an “annotation layer”. Extraction of a single annotation layer results in a well-formed XML document, and each annotation layer may be associated with an XML schema. The MuLaX processing model works on the nodes of one annotation layer at a time through Xpath-like navigation. CONCUR lives!
This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
Der vorliegende Band befasst sich mit dem Stand und der Entwicklung von Forschungsinfrastrukturen für die germanistische Linguistik und einigen angrenzenden Bereichen. Einen zentralen Aspekt dabei bildet die Notwendigkeit, Kooperativität in der Wissenschaft im institutionellen Sinne, aber auch in Hinsicht auf die wissenschaftliche Praxis zu organisieren. Dies geschieht in Verbunden als Kooperationsstrukturen, wobei Sprachwissenschaft und Sprachtechnologie miteinander verbunden werden. Als zentraler Forschungsressource kommen dabei Korpora und ihrer Erschließung durch spezielle, linguistisch motivierte Informationssysteme besondere Bedeutung zu. Auf der Ebene der Daten werden durch Annotations- und Modellierungsstandards die Voraussetzung für eine nachhaltige Nutzbarkeit derartiger Ressourcen geschaffen.
L’article intitulé «Traitement de l’information: Spinfo, HKI et humanités numériques - l’expérience de Cologne» présente l’histoire du développement des humanités numériques au sein de l’Université de Cologne. L'institutionnalisation des humanités numériques a commencé encore à l’époque où dans le monde germanophone le périmètre de la discipline était en train d’être défini par les travaux de quelques pionniers. Parmi eux, il convient de souligner le rôle d’Elisabeth Burr, active notamment à Tubingue, Duisbourg, Brême et Leipzig.L’article retrace le développement des humanités numériques à Cologne à partir de leurs débuts dans les années soixante du 20ème siècle, en passant par leur consolidation dans les années quatre-vingt-dix, jusqu’aux deux dernières décennies, quand Cologne est devenu un centre important de cette discipline. Le processus illustre comment une nouvelle discipline scientifique peut s’institutionnaliser au sein d’une université allemande. L’article décrit la perspective de deux domaines fondateurs: le traitement linguistique de l’information (en allemand: Sprachliche Informationsverarbeitung, Spinfo) et le traitement historico-culturel de l’information (en allemand: Historisch Kulturwissenschaftliche Informationsverarbeitung, HKI) et leur synthèse, qui a abouti en 2017 à la création de l’Institut des Humanités Numériques (Digital Humanities), qui aujourd’hui est - du point de vue interne - une composante de la Faculté de Philosophie de l’Université de Cologne et - du point de vue externe - une partie intégrante de la communauté internationale des humanités numériques.