Refine
Year of publication
Document Type
- Conference Proceeding (71)
- Part of a Book (53)
- Article (14)
- Working Paper (8)
- Other (3)
- Doctoral Thesis (1)
- Master's Thesis (1)
- Report (1)
Keywords
- Korpus <Linguistik> (54)
- Computerlinguistik (28)
- Annotation (26)
- Digital Humanities (26)
- Forschungsdaten (18)
- XML (15)
- Sprachdaten (14)
- Auszeichnungssprache (12)
- Infrastruktur (10)
- Langzeitarchivierung (10)
- Automatische Sprachanalyse (8)
- Deutsch (8)
- Institut für Deutsche Sprache <Mannheim> (8)
- Urheberrecht (8)
- Linguistik (7)
- Datenschutz (6)
- Geisteswissenschaften (6)
- Standardisierung (6)
- CLARIN (5)
- Concurrent Markup/Overlap (5)
- Digitale Sprachressourcen (5)
- Forschungsinfrastruktur (5)
- Germanistik (5)
- Metadaten (5)
- SGML (5)
- Text Encoding Initiative (5)
- Texttechnologie (5)
- language resources (5)
- Deutsches Referenzkorpus (DeReKo) (4)
- Gesprochene Sprache (4)
- Kontrastive Linguistik (4)
- Sprachverarbeitung (4)
- TextGrid (4)
- research infrastructure (4)
- DSSSL (3)
- Data Mining (3)
- Datenmanagement (3)
- Enzyklopädie (3)
- Forschung (3)
- Hypertext (3)
- Information Retrieval (3)
- Interoperabilität (3)
- Korpusanalyseplattform (KorAP) (3)
- Korpuslinguistik (3)
- Maschinelles Lernen (3)
- Natürliche Sprache (3)
- Open Science (3)
- Sozialwissenschaften (3)
- Text Technology (3)
- Textkorpus (3)
- Textlinguistik (3)
- corpus analysis (3)
- impact assessment (3)
- language technology (3)
- Archiv (2)
- CLARIAH-DE (2)
- Creative Commons (2)
- Daten (2)
- Datenbank (2)
- Datenstruktur (2)
- Datenverarbeitung (2)
- Digitale Daten (2)
- Digitalisierung (2)
- Elektronische Publikation (2)
- Ethik (2)
- Europa (2)
- FAIR data principles (2)
- Forschungsdatenmanagement (2)
- Hamlet (2)
- Informationsmanagement (2)
- Informationsstruktur (2)
- Integration (2)
- Interoperability (2)
- Kooperation (2)
- Leibniz-Institut für Deutsche Sprache (IDS) (2)
- Linguistische Datenverarbeitung (2)
- Modeling (2)
- NFDI (2)
- Nachhaltigkeit (2)
- Nationale Forschungsdateninfrastruktur (NFDI) e.V. (2)
- Open Access (2)
- Personenbezogene Daten (2)
- Recht (2)
- Rechtschreibung (2)
- Repository <Informatik> (2)
- Resources (2)
- Rumänisch (2)
- Schriftsprache (2)
- Shakespeare, William (2)
- Sprachgebrauch (2)
- Sprachpolitik (2)
- Standard (2)
- Studiengang (2)
- Text+ (2)
- Trees/Graphs (2)
- Universität zu Köln (2)
- Validating (2)
- Virtuelle Forschungsumgebung (2)
- Wissenschaft (2)
- Wissensvermittlung (2)
- XML (Extensible Markup Language) (2)
- comparable corpora (2)
- computerunterstützte Lexikographie (2)
- virtuelle Forschungsumgebung (2)
- Abfragesprache (1)
- Access Control (1)
- Anapher <Syntax> (1)
- Annotations (1)
- Archivierung (1)
- Argumentstruktur (1)
- Austauschformat (1)
- Automatische Sprachverarbeitung (1)
- Außeruniversitäre staatliche Forschungseinrichtung (1)
- Best-Practice (1)
- CLARIN Knowledge Sharing Infrastructure (1)
- Clarín (1)
- Co-Reference (1)
- Computer-mediated communication (1)
- Computerunterstützte Lexikografie (1)
- Computerunterstützte Lexikographie (1)
- Computerunterstützter Unterricht (1)
- Computing in the Humanities (1)
- Concurrent markup (1)
- Coreference (1)
- Corpora (1)
- Corpus Management (1)
- Corpus linguistics (1)
- Corpus technology (1)
- DHd2023 (1)
- DaF-Unterricht (1)
- DaZ-Unterricht (1)
- Data Architecture (1)
- Data Formats (1)
- Daten übergeben (1)
- Datenanalyse (1)
- Datenaufbereitung (1)
- Datenbank für Gesprochenes Deutsch (1)
- Datendomäne Sammlungen (1)
- Datenerfassung (1)
- Datensatz (1)
- Datenschutz-Grundverordnung (1)
- Datenschutzrichtlinie (1)
- Datenspeicherung (1)
- Dependenzgrammatik (1)
- Deutsches Referenzkorpus (1)
- Deutsches Textarchiv (1)
- Deutschland (1)
- Deutschland. Bundesministerium für Bildung und Forschung (1)
- Digital Humanities Studium (1)
- Digitale Forschungsdaten (1)
- Digitale Geisteswissenschaften (1)
- Digitale Lehre (1)
- Digitale Werkzeuge (1)
- Digitales Wörterbuch der deutschen Sprache (1)
- DocBook (1)
- Dokumentenverarbeitung (1)
- E-Learning (1)
- EFNIL (1)
- EOSC (1)
- Eigentum (1)
- Einführung (1)
- Empirische Linguistik (1)
- Entwicklung (1)
- European Federation of National Institutions for Language (1)
- FML (1)
- Forschungseinrichtung (1)
- Forschungsfinanzierung (1)
- Forschungsimpact (1)
- Forschungsinfrastrukturen (1)
- Forschungsprojekt (1)
- Forschungsverbund (1)
- Freiheit (1)
- Fremdsprache (1)
- Fremdsprachenlernen (1)
- Fédération Européenne des Institutions Linguistiques Nationales (1)
- GDPR (1)
- Geistiges Eigentum (1)
- German language (1)
- Geschichte (1)
- Geschichtswissenschaft (1)
- Gleichheit (1)
- Grammatiktheorie (1)
- HPSG (1)
- Head-driven phrase structure grammar (1)
- Higher Education (1)
- Historische Lexikografie (1)
- IDS (1)
- ISO (1)
- IT infrastructure (1)
- Impact-Indikatoren (1)
- Information Extraction (1)
- Informationsgesellschaft (1)
- Informationsintegration (1)
- Informationssystem (1)
- Informationsverarbeitung (1)
- Informationsversorgung (1)
- Informationswissenschaft (1)
- Infrastrukturplanung (1)
- Innovation (1)
- Interdisziplinarität (1)
- Internet (1)
- Internetbasierte Kommunikation (1)
- Interrelated document grammars (1)
- Kongress (1)
- Kontrollierte Sprache (1)
- KorAP (1)
- KorAP (Korpusanalyseplattform der nächsten Generation) (1)
- Korpusanalyse (1)
- Korpusmanagement (1)
- Korpustechnologie (1)
- Kulturwissenschaften (1)
- Language (1)
- Language resources (1)
- Language technology (1)
- Langzeitarchierung (1)
- Lehre (1)
- Leibniz-WissenschaftsCampus Mannheim/Heidelberg (1)
- Lemma (1)
- Linguistikstudium (1)
- Linguistische Informationswissenschaft (1)
- Literaturwissenschaft (1)
- Lizenzvergabe (1)
- Long-Term Archiving (1)
- Markup Languages (1)
- Markup Languages & Programming (1)
- Maschinelle Übersetzung (1)
- Mehrsprachigkeit (1)
- Methode (1)
- Migration (1)
- Modellierung (1)
- Morphology (1)
- Multilingual corpus (1)
- Multimodalität (1)
- Multiple annotations (1)
- Namespaces (1)
- National corpus (1)
- Nationale Forschungsdateninfrastruktur (NFDI) (1)
- Natural language processing (1)
- Nutzungsrecht (1)
- Online-Ressource (1)
- Ontologie <Wissensverarbeitung> (1)
- Ontology (1)
- Open Source (1)
- Organisation (1)
- Ortsverteilt (1)
- Part-of-speech tagging (1)
- Phrasenstrukturgrammatik (1)
- Preservation (1)
- Privacy by Design (1)
- Privatsphäre (1)
- Processing (1)
- Programmiersprache (1)
- Prolog (1)
- Pronomen (1)
- Query Languages (1)
- Query Rewriting (1)
- Querying (1)
- Rechtsfrage (1)
- Research infrastructure (1)
- Research infrastructures (1)
- Ressourcen (1)
- Rezeption (1)
- SSH (1)
- Sammlungen (1)
- Schema Languages (1)
- Schulbuch (1)
- Semantic Analysis (1)
- Semantic Web (1)
- Semantik (1)
- Semantische Analyse (1)
- Semantische Relation (1)
- Semasiologie (1)
- Service provider (1)
- Social sciences and humanities (1)
- Speech Corpora (1)
- Speech Lexica (1)
- Spoken Language Data (1)
- Sprachentwicklung (1)
- Sprachtechnologie (1)
- Sprachtypologie (1)
- Sprachvariante (1)
- Sprachverfall (1)
- Sprachwandel (1)
- Sprachwissenschaft (1)
- Staatssprache (1)
- Strukturbaum (1)
- Studium (1)
- Sustainability (1)
- Syntaktische Analyse (1)
- Syntax (1)
- TEI (1)
- TIB (1)
- Technische Informationsbibliothek (TIB) (1)
- Technische Infrastruktur (1)
- Technologie (1)
- Terminologie (1)
- Testproduktion (1)
- Text (1)
- Text Encoding Initiative (TEI) (1)
- Text Mining (1)
- Text data (1)
- TextTransfer (1)
- Textanalyse (1)
- Textplus (1)
- Textproduktion (1)
- Textverarbeitung (1)
- Topikalisierung (1)
- Transfer-Potenzial (1)
- Transkription (1)
- Transparenz (1)
- Treebank (1)
- Unifikationsgrammatik (1)
- Universität Bielefeld (1)
- Unterricht (1)
- Virtuelle Forschungsumgebungen (1)
- Virtuelle Hochschule (1)
- Visualisierung (1)
- WebLicht (1)
- Wikipedia (1)
- Wissenschaftliche Kooperation (1)
- Wissensgraph (1)
- Wissenspräsentation (1)
- Wissensrepräsentation (1)
- Wissenstransfer (1)
- Wissensverarbeitung (1)
- Wohin damit? Storing and reusing my language data (1)
- Wörterbuch (1)
- XQuery (1)
- XSLT (1)
- Zweitsprache (1)
- búsqueda (1)
- category detection (1)
- code of ethics (1)
- collections (1)
- computer-mediated communication (1)
- concept systems (1)
- copyright (1)
- corpus (1)
- corpus analysis tools (1)
- corpus linguistics (1)
- corpus technology (1)
- data depositing (1)
- dictionary encoding (1)
- digital research infrastructure (1)
- disambiguation (1)
- ethics (1)
- historical encyclopedias (1)
- impact (1)
- impact categories (1)
- information infrastructure (1)
- innovation (1)
- interactive graph visualisation (1)
- international language (1)
- interoperability (1)
- language documentation (1)
- large corpus data (1)
- law (1)
- lexicography (1)
- liability (1)
- linguistic diversity (1)
- machine learning (1)
- mantenimiento (1)
- markup language (1)
- microservices (1)
- multidisciplinarity (1)
- natural language processing (1)
- network analysis (1)
- official language (1)
- onomasiological model (1)
- open science (1)
- personal data (1)
- privacy (1)
- recursos (1)
- reference corpora (1)
- reference corpus (1)
- repository (1)
- research data management (1)
- research reports (1)
- scalability (1)
- search engine (1)
- semasiological model (1)
- service interoperability (1)
- sostenibilidad (1)
- spelling reform (1)
- standardisation (1)
- standards (TEI/TMF/LMF) (1)
- standoff annotation (1)
- tentative taxonomy (1)
- terminology (1)
- terminology visualisation (1)
- virtual collections (1)
- web-based information system (1)
- word sense alignment (1)
- Öffentlichkeit (1)
Publicationstate
- Veröffentlichungsversion (99)
- Zweitveröffentlichung (23)
- Postprint (12)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (16)
- de Gruyter (8)
- Extreme Markup Languages Conference (6)
- Springer (6)
- European language resources association (ELRA) (4)
- Oxford University Press (4)
- Universität Bielefeld (4)
- European Language Resources Association (3)
- Institut für Deutsche Sprache (3)
- Narr (3)
This paper presents the application of the <tiger2/> format to various linguistic scenarios with the aim of making it the standard serialisation for the ISO 24615 [1] (SynAF) standard. After outlining the main characteristics of both the SynAF metamodel and the <tiger2/> format, as extended from the initial Tiger XML format [2], we show through a range of different language families how <tiger2/> covers a variety of constituency and dependency based analyses.
XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.
We present SPLICR, the Web-based Sustainability Platform for Linguistic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one’s spe cific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collec- tions of language resources.
We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.
This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.