Refine
Document Type
- Part of a Book (5)
- Article (4)
- Conference Proceeding (3)
Has Fulltext
- yes (12)
Keywords
- Annotation (4)
- Digital Humanities (4)
- Computerlinguistik (3)
- Sprachdaten (3)
- Automatische Sprachanalyse (2)
- Datenstruktur (2)
- Korpus <Linguistik> (2)
- XML (Extensible Markup Language) (2)
- Annotations (1)
- Argumentstruktur (1)
Publicationstate
- Postprint (12) (remove)
Reviewstate
- (Verlags)-Lektorat (10)
- Peer-Review (2)
The Leibniz-Institute for the German Language (IDS) was established in Mannheim in 1964. Since then, it has been at the forefront of innovation in German linguistics as a hub for digital language data. This chapter presents various lessons learnt from over five decades of work by the IDS, ranging from the importance of sustainability, through its strong technical base and FAIR principles, to the IDS’ role in national and international cooperation projects and its expertise on legal and ethical issues related to language resources and language technology.
Was darf die sprachwissenschaftliche Forschung? Juristische Fragen bei der Arbeit mit Sprachdaten
(2022)
Sich in der Linguistik mit rechtlichen Themen beschäftigen zu müssen, ist auf den ersten Blick überraschend. Da jedoch in den Sprachwissenschaften empirisch gearbeitet wird und Sprachdaten, insbesondere Texte und Ton- und Videoaufnahmen sowie Transkripte gesprochener Sprache, in den letzten Jahren auch verstärkt Sprachdaten internetbasierter Kommunikation, als Basis für die linguistische Forschung dienen, müssen rechtliche Rahmenbedingungen für jede Art von Datennutzung beachtet werden. Natürlich arbeiten auch andere Wissenschaften, wie z. B. die Astronomie oder die Meteorologie, empirisch. Jedoch gibt es einen grundsätzlichen Unterschied der empirischen Basis: Im Gegensatz zu Temperaturen, die gemessen, oder Konstellationen von Himmelskörpern, die beobachtet werden, basieren Sprachdaten auf schriftlichen, mündlichen oder gebärdeten Äußerungen von Menschen, wodurch sich juristisch begründete Beschränkungen ihrer Nutzung ergeben.
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
Different Views on Markup
(2010)
In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.
An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and concurrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the computing of a target node hierarchy for a well-formed, merged XML document. Special attention is paid to identity conflicts between element instances, for which a default solution that takes into account metarelations that hold between element types on the different annotation layers is provided. In addition, rules can be specified by a user to prescribe how identity conflicts should be solved for certain element types.
This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.