Refine
Year of publication
Document Type
- Conference Proceeding (71)
- Part of a Book (53)
- Article (14)
- Working Paper (8)
- Other (3)
- Doctoral Thesis (1)
- Master's Thesis (1)
- Report (1)
Keywords
- Korpus <Linguistik> (54)
- Computerlinguistik (28)
- Annotation (26)
- Digital Humanities (26)
- Forschungsdaten (18)
- XML (15)
- Sprachdaten (14)
- Auszeichnungssprache (12)
- Infrastruktur (10)
- Langzeitarchivierung (10)
- Automatische Sprachanalyse (8)
- Deutsch (8)
- Institut für Deutsche Sprache <Mannheim> (8)
- Urheberrecht (8)
- Linguistik (7)
- Datenschutz (6)
- Geisteswissenschaften (6)
- Standardisierung (6)
- CLARIN (5)
- Concurrent Markup/Overlap (5)
- Digitale Sprachressourcen (5)
- Forschungsinfrastruktur (5)
- Germanistik (5)
- Metadaten (5)
- SGML (5)
- Text Encoding Initiative (5)
- Texttechnologie (5)
- language resources (5)
- Deutsches Referenzkorpus (DeReKo) (4)
- Gesprochene Sprache (4)
- Kontrastive Linguistik (4)
- Sprachverarbeitung (4)
- TextGrid (4)
- research infrastructure (4)
- DSSSL (3)
- Data Mining (3)
- Datenmanagement (3)
- Enzyklopädie (3)
- Forschung (3)
- Hypertext (3)
- Information Retrieval (3)
- Interoperabilität (3)
- Korpusanalyseplattform (KorAP) (3)
- Korpuslinguistik (3)
- Maschinelles Lernen (3)
- Natürliche Sprache (3)
- Open Science (3)
- Sozialwissenschaften (3)
- Text Technology (3)
- Textkorpus (3)
- Textlinguistik (3)
- corpus analysis (3)
- impact assessment (3)
- language technology (3)
- Archiv (2)
- CLARIAH-DE (2)
- Creative Commons (2)
- Daten (2)
- Datenbank (2)
- Datenstruktur (2)
- Datenverarbeitung (2)
- Digitale Daten (2)
- Digitalisierung (2)
- Elektronische Publikation (2)
- Ethik (2)
- Europa (2)
- FAIR data principles (2)
- Forschungsdatenmanagement (2)
- Hamlet (2)
- Informationsmanagement (2)
- Informationsstruktur (2)
- Integration (2)
- Interoperability (2)
- Kooperation (2)
- Leibniz-Institut für Deutsche Sprache (IDS) (2)
- Linguistische Datenverarbeitung (2)
- Modeling (2)
- NFDI (2)
- Nachhaltigkeit (2)
- Nationale Forschungsdateninfrastruktur (NFDI) e.V. (2)
- Open Access (2)
- Personenbezogene Daten (2)
- Recht (2)
- Rechtschreibung (2)
- Repository <Informatik> (2)
- Resources (2)
- Rumänisch (2)
- Schriftsprache (2)
- Shakespeare, William (2)
- Sprachgebrauch (2)
- Sprachpolitik (2)
- Standard (2)
- Studiengang (2)
- Text+ (2)
- Trees/Graphs (2)
- Universität zu Köln (2)
- Validating (2)
- Virtuelle Forschungsumgebung (2)
- Wissenschaft (2)
- Wissensvermittlung (2)
- XML (Extensible Markup Language) (2)
- comparable corpora (2)
- computerunterstützte Lexikographie (2)
- virtuelle Forschungsumgebung (2)
- Abfragesprache (1)
- Access Control (1)
- Anapher <Syntax> (1)
- Annotations (1)
- Archivierung (1)
- Argumentstruktur (1)
- Austauschformat (1)
- Automatische Sprachverarbeitung (1)
- Außeruniversitäre staatliche Forschungseinrichtung (1)
- Best-Practice (1)
- CLARIN Knowledge Sharing Infrastructure (1)
- Clarín (1)
- Co-Reference (1)
- Computer-mediated communication (1)
- Computerunterstützte Lexikografie (1)
- Computerunterstützte Lexikographie (1)
- Computerunterstützter Unterricht (1)
- Computing in the Humanities (1)
- Concurrent markup (1)
- Coreference (1)
- Corpora (1)
- Corpus Management (1)
- Corpus linguistics (1)
- Corpus technology (1)
- DHd2023 (1)
- DaF-Unterricht (1)
- DaZ-Unterricht (1)
- Data Architecture (1)
- Data Formats (1)
- Daten übergeben (1)
- Datenanalyse (1)
- Datenaufbereitung (1)
- Datenbank für Gesprochenes Deutsch (1)
- Datendomäne Sammlungen (1)
- Datenerfassung (1)
- Datensatz (1)
- Datenschutz-Grundverordnung (1)
- Datenschutzrichtlinie (1)
- Datenspeicherung (1)
- Dependenzgrammatik (1)
- Deutsches Referenzkorpus (1)
- Deutsches Textarchiv (1)
- Deutschland (1)
- Deutschland. Bundesministerium für Bildung und Forschung (1)
- Digital Humanities Studium (1)
- Digitale Forschungsdaten (1)
- Digitale Geisteswissenschaften (1)
- Digitale Lehre (1)
- Digitale Werkzeuge (1)
- Digitales Wörterbuch der deutschen Sprache (1)
- DocBook (1)
- Dokumentenverarbeitung (1)
- E-Learning (1)
- EFNIL (1)
- EOSC (1)
- Eigentum (1)
- Einführung (1)
- Empirische Linguistik (1)
- Entwicklung (1)
- European Federation of National Institutions for Language (1)
- FML (1)
- Forschungseinrichtung (1)
- Forschungsfinanzierung (1)
- Forschungsimpact (1)
- Forschungsinfrastrukturen (1)
- Forschungsprojekt (1)
- Forschungsverbund (1)
- Freiheit (1)
- Fremdsprache (1)
- Fremdsprachenlernen (1)
- Fédération Européenne des Institutions Linguistiques Nationales (1)
- GDPR (1)
- Geistiges Eigentum (1)
- German language (1)
- Geschichte (1)
- Geschichtswissenschaft (1)
- Gleichheit (1)
- Grammatiktheorie (1)
- HPSG (1)
- Head-driven phrase structure grammar (1)
- Higher Education (1)
- Historische Lexikografie (1)
- IDS (1)
- ISO (1)
- IT infrastructure (1)
- Impact-Indikatoren (1)
- Information Extraction (1)
- Informationsgesellschaft (1)
- Informationsintegration (1)
- Informationssystem (1)
- Informationsverarbeitung (1)
- Informationsversorgung (1)
- Informationswissenschaft (1)
- Infrastrukturplanung (1)
- Innovation (1)
- Interdisziplinarität (1)
- Internet (1)
- Internetbasierte Kommunikation (1)
- Interrelated document grammars (1)
- Kongress (1)
- Kontrollierte Sprache (1)
- KorAP (1)
- KorAP (Korpusanalyseplattform der nächsten Generation) (1)
- Korpusanalyse (1)
- Korpusmanagement (1)
- Korpustechnologie (1)
- Kulturwissenschaften (1)
- Language (1)
- Language resources (1)
- Language technology (1)
- Langzeitarchierung (1)
- Lehre (1)
- Leibniz-WissenschaftsCampus Mannheim/Heidelberg (1)
- Lemma (1)
- Linguistikstudium (1)
- Linguistische Informationswissenschaft (1)
- Literaturwissenschaft (1)
- Lizenzvergabe (1)
- Long-Term Archiving (1)
- Markup Languages (1)
- Markup Languages & Programming (1)
- Maschinelle Übersetzung (1)
- Mehrsprachigkeit (1)
- Methode (1)
- Migration (1)
- Modellierung (1)
- Morphology (1)
- Multilingual corpus (1)
- Multimodalität (1)
- Multiple annotations (1)
- Namespaces (1)
- National corpus (1)
- Nationale Forschungsdateninfrastruktur (NFDI) (1)
- Natural language processing (1)
- Nutzungsrecht (1)
- Online-Ressource (1)
- Ontologie <Wissensverarbeitung> (1)
- Ontology (1)
- Open Source (1)
- Organisation (1)
- Ortsverteilt (1)
- Part-of-speech tagging (1)
- Phrasenstrukturgrammatik (1)
- Preservation (1)
- Privacy by Design (1)
- Privatsphäre (1)
- Processing (1)
- Programmiersprache (1)
- Prolog (1)
- Pronomen (1)
- Query Languages (1)
- Query Rewriting (1)
- Querying (1)
- Rechtsfrage (1)
- Research infrastructure (1)
- Research infrastructures (1)
- Ressourcen (1)
- Rezeption (1)
- SSH (1)
- Sammlungen (1)
- Schema Languages (1)
- Schulbuch (1)
- Semantic Analysis (1)
- Semantic Web (1)
- Semantik (1)
- Semantische Analyse (1)
- Semantische Relation (1)
- Semasiologie (1)
- Service provider (1)
- Social sciences and humanities (1)
- Speech Corpora (1)
- Speech Lexica (1)
- Spoken Language Data (1)
- Sprachentwicklung (1)
- Sprachtechnologie (1)
- Sprachtypologie (1)
- Sprachvariante (1)
- Sprachverfall (1)
- Sprachwandel (1)
- Sprachwissenschaft (1)
- Staatssprache (1)
- Strukturbaum (1)
- Studium (1)
- Sustainability (1)
- Syntaktische Analyse (1)
- Syntax (1)
- TEI (1)
- TIB (1)
- Technische Informationsbibliothek (TIB) (1)
- Technische Infrastruktur (1)
- Technologie (1)
- Terminologie (1)
- Testproduktion (1)
- Text (1)
- Text Encoding Initiative (TEI) (1)
- Text Mining (1)
- Text data (1)
- TextTransfer (1)
- Textanalyse (1)
- Textplus (1)
- Textproduktion (1)
- Textverarbeitung (1)
- Topikalisierung (1)
- Transfer-Potenzial (1)
- Transkription (1)
- Transparenz (1)
- Treebank (1)
- Unifikationsgrammatik (1)
- Universität Bielefeld (1)
- Unterricht (1)
- Virtuelle Forschungsumgebungen (1)
- Virtuelle Hochschule (1)
- Visualisierung (1)
- WebLicht (1)
- Wikipedia (1)
- Wissenschaftliche Kooperation (1)
- Wissensgraph (1)
- Wissenspräsentation (1)
- Wissensrepräsentation (1)
- Wissenstransfer (1)
- Wissensverarbeitung (1)
- Wohin damit? Storing and reusing my language data (1)
- Wörterbuch (1)
- XQuery (1)
- XSLT (1)
- Zweitsprache (1)
- búsqueda (1)
- category detection (1)
- code of ethics (1)
- collections (1)
- computer-mediated communication (1)
- concept systems (1)
- copyright (1)
- corpus (1)
- corpus analysis tools (1)
- corpus linguistics (1)
- corpus technology (1)
- data depositing (1)
- dictionary encoding (1)
- digital research infrastructure (1)
- disambiguation (1)
- ethics (1)
- historical encyclopedias (1)
- impact (1)
- impact categories (1)
- information infrastructure (1)
- innovation (1)
- interactive graph visualisation (1)
- international language (1)
- interoperability (1)
- language documentation (1)
- large corpus data (1)
- law (1)
- lexicography (1)
- liability (1)
- linguistic diversity (1)
- machine learning (1)
- mantenimiento (1)
- markup language (1)
- microservices (1)
- multidisciplinarity (1)
- natural language processing (1)
- network analysis (1)
- official language (1)
- onomasiological model (1)
- open science (1)
- personal data (1)
- privacy (1)
- recursos (1)
- reference corpora (1)
- reference corpus (1)
- repository (1)
- research data management (1)
- research reports (1)
- scalability (1)
- search engine (1)
- semasiological model (1)
- service interoperability (1)
- sostenibilidad (1)
- spelling reform (1)
- standardisation (1)
- standards (TEI/TMF/LMF) (1)
- standoff annotation (1)
- tentative taxonomy (1)
- terminology (1)
- terminology visualisation (1)
- virtual collections (1)
- web-based information system (1)
- word sense alignment (1)
- Öffentlichkeit (1)
Publicationstate
- Veröffentlichungsversion (99)
- Zweitveröffentlichung (23)
- Postprint (12)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (16)
- de Gruyter (8)
- Extreme Markup Languages Conference (6)
- Springer (6)
- European language resources association (ELRA) (4)
- Oxford University Press (4)
- Universität Bielefeld (4)
- European Language Resources Association (3)
- Institut für Deutsche Sprache (3)
- Narr (3)
Linguistic corpora have been annotated by means of SGML-based markup languages for almost 20 years. We can, very roughly, differentiate between three distinct evolutionary stages of markup technologies. (1)Originally, single SGML tree-based document instances were deemed sufficient for the representation of linguistic structures. (2) Linguists began to realize that alternatives and extensions to the traditional model are needed. Formalisms such as, for example, NITE were proposed: the NITE Object Model (NOM) consists of multi-rooted trees. (3) We are now on the threshold of the third evolutionary stage: even NITE's very flexible approach is not suited for all linguistic purposes. As some structures, such as these, cannot be modeled by multi-rooted trees, an even more flexible approach is needed in order to provide a generic annotation format that is able to represent genuinely arbitrary linguistic data structures.
Poster des Text+ Partners Leibniz-Institut für Deutsche Sprache Mannheim präsentiert beim Workshop "Wohin damit? Storing and reusing my language data" am 22. Juni 2023 in Mannheim. Das Poster wurde im Kontext der Arbeit des Vereins Nationale Forschungsdateninfrastruktur (NFDI) e.V. verfasst. NFDI wird von der Bundesrepublik Deutschland und den 16 Bundesländern finanziert, und das Konsortium Text+ wird gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 460033370. Die Autor:innen bedanken sich für die Förderung sowie Unterstützung. Ein Dank geht außerdem an alle Einrichtungen und Akteur:innen, die sich für den Verein und dessen Ziele engagieren.
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
Formalisierung von Kontext und sprachlichem Wissen mit Prioritisierter Circumscription (VM-Memo 55)
(1994)
On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees
(2007)
The Generalised Architecture for Sustainability (GENAU) provides a framework for the transformation of single-file, multi-layer annotations into multi-rooted trees. By employing constraints expressed in XCONCUR-CL, this procedure can be performed lossless, i.e., without losing information, especially with regard to the nesting of elements that belong to multiple annotation layers. This article describes how different types of linguistic corpora can be transformed using specialised tools, and how constraint rules can be applied to the resulting multi-rooted trees to add an additional level of validation.
This paper describes the effort of the Institut für Deutsche Sprache (IDS), the central research institution for the German language, connected with Information and Communications Technology (ICT). Use of ICT in a language research institute is twofold. On the one hand, ICT provides basic services for researches to accomplish their daily work. On the other hand, several national and international institutions have a strong interest in ICT. Therefore, ICT can also be seen as an amplifier for language research. The first part of this paper reports on the activates of the IDS in internal and external ICT-related projects and initiatives. The second part describes a general strategy towards an ICT strategy that could be useful both for the IDS and other national language institutes. We think such a general strategy is necessary to create a strong foundation not only for the ICT-related projects, but as a basis for a modem research institute.
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.