Refine
Year of publication
Document Type
- Conference Proceeding (41)
- Part of a Book (33)
- Article (5)
- Working Paper (2)
Has Fulltext
- yes (81)
Keywords
- Korpus <Linguistik> (28)
- Annotation (18)
- Computerlinguistik (18)
- Digital Humanities (13)
- Auszeichnungssprache (11)
- Sprachdaten (8)
- XML (7)
- Automatische Sprachanalyse (5)
- Digitale Sprachressourcen (5)
- Forschungsdaten (5)
Publicationstate
- Veröffentlichungsversion (64)
- Postprint (10)
- Zweitveröffentlichung (10)
Reviewstate
- (Verlags)-Lektorat (81) (remove)
Publisher
- European Language Resources Association (ELRA) (11)
- de Gruyter (7)
- Springer (5)
- Extreme Markup Languages Conference (4)
- Narr (3)
- Oxford University Press (3)
- University of Illinois (3)
- University of Oulu (3)
- Lang (2)
- Nisaba (2)
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
Formalisierung von Kontext und sprachlichem Wissen mit Prioritisierter Circumscription (VM-Memo 55)
(1994)
This paper describes the effort of the Institut für Deutsche Sprache (IDS), the central research institution for the German language, connected with Information and Communications Technology (ICT). Use of ICT in a language research institute is twofold. On the one hand, ICT provides basic services for researches to accomplish their daily work. On the other hand, several national and international institutions have a strong interest in ICT. Therefore, ICT can also be seen as an amplifier for language research. The first part of this paper reports on the activates of the IDS in internal and external ICT-related projects and initiatives. The second part describes a general strategy towards an ICT strategy that could be useful both for the IDS and other national language institutes. We think such a general strategy is necessary to create a strong foundation not only for the ICT-related projects, but as a basis for a modem research institute.
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.