Refine
Year of publication
Document Type
- Conference Proceeding (71)
- Part of a Book (53)
- Article (14)
- Working Paper (8)
- Other (3)
- Doctoral Thesis (1)
- Master's Thesis (1)
- Report (1)
Keywords
- Korpus <Linguistik> (54)
- Computerlinguistik (28)
- Annotation (26)
- Digital Humanities (26)
- Forschungsdaten (18)
- XML (15)
- Sprachdaten (14)
- Auszeichnungssprache (12)
- Infrastruktur (10)
- Langzeitarchivierung (10)
- Automatische Sprachanalyse (8)
- Deutsch (8)
- Institut für Deutsche Sprache <Mannheim> (8)
- Urheberrecht (8)
- Linguistik (7)
- Datenschutz (6)
- Geisteswissenschaften (6)
- Standardisierung (6)
- CLARIN (5)
- Concurrent Markup/Overlap (5)
- Digitale Sprachressourcen (5)
- Forschungsinfrastruktur (5)
- Germanistik (5)
- Metadaten (5)
- SGML (5)
- Text Encoding Initiative (5)
- Texttechnologie (5)
- language resources (5)
- Deutsches Referenzkorpus (DeReKo) (4)
- Gesprochene Sprache (4)
- Kontrastive Linguistik (4)
- Sprachverarbeitung (4)
- TextGrid (4)
- research infrastructure (4)
- DSSSL (3)
- Data Mining (3)
- Datenmanagement (3)
- Enzyklopädie (3)
- Forschung (3)
- Hypertext (3)
- Information Retrieval (3)
- Interoperabilität (3)
- Korpusanalyseplattform (KorAP) (3)
- Korpuslinguistik (3)
- Maschinelles Lernen (3)
- Natürliche Sprache (3)
- Open Science (3)
- Sozialwissenschaften (3)
- Text Technology (3)
- Textkorpus (3)
- Textlinguistik (3)
- corpus analysis (3)
- impact assessment (3)
- language technology (3)
- Archiv (2)
- CLARIAH-DE (2)
- Creative Commons (2)
- Daten (2)
- Datenbank (2)
- Datenstruktur (2)
- Datenverarbeitung (2)
- Digitale Daten (2)
- Digitalisierung (2)
- Elektronische Publikation (2)
- Ethik (2)
- Europa (2)
- FAIR data principles (2)
- Forschungsdatenmanagement (2)
- Hamlet (2)
- Informationsmanagement (2)
- Informationsstruktur (2)
- Integration (2)
- Interoperability (2)
- Kooperation (2)
- Leibniz-Institut für Deutsche Sprache (IDS) (2)
- Linguistische Datenverarbeitung (2)
- Modeling (2)
- NFDI (2)
- Nachhaltigkeit (2)
- Nationale Forschungsdateninfrastruktur (NFDI) e.V. (2)
- Open Access (2)
- Personenbezogene Daten (2)
- Recht (2)
- Rechtschreibung (2)
- Repository <Informatik> (2)
- Resources (2)
- Rumänisch (2)
- Schriftsprache (2)
- Shakespeare, William (2)
- Sprachgebrauch (2)
- Sprachpolitik (2)
- Standard (2)
- Studiengang (2)
- Text+ (2)
- Trees/Graphs (2)
- Universität zu Köln (2)
- Validating (2)
- Virtuelle Forschungsumgebung (2)
- Wissenschaft (2)
- Wissensvermittlung (2)
- XML (Extensible Markup Language) (2)
- comparable corpora (2)
- computerunterstützte Lexikographie (2)
- virtuelle Forschungsumgebung (2)
- Abfragesprache (1)
- Access Control (1)
- Anapher <Syntax> (1)
- Annotations (1)
- Archivierung (1)
- Argumentstruktur (1)
- Austauschformat (1)
- Automatische Sprachverarbeitung (1)
- Außeruniversitäre staatliche Forschungseinrichtung (1)
- Best-Practice (1)
- CLARIN Knowledge Sharing Infrastructure (1)
- Clarín (1)
- Co-Reference (1)
- Computer-mediated communication (1)
- Computerunterstützte Lexikografie (1)
- Computerunterstützte Lexikographie (1)
- Computerunterstützter Unterricht (1)
- Computing in the Humanities (1)
- Concurrent markup (1)
- Coreference (1)
- Corpora (1)
- Corpus Management (1)
- Corpus linguistics (1)
- Corpus technology (1)
- DHd2023 (1)
- DaF-Unterricht (1)
- DaZ-Unterricht (1)
- Data Architecture (1)
- Data Formats (1)
- Daten übergeben (1)
- Datenanalyse (1)
- Datenaufbereitung (1)
- Datenbank für Gesprochenes Deutsch (1)
- Datendomäne Sammlungen (1)
- Datenerfassung (1)
- Datensatz (1)
- Datenschutz-Grundverordnung (1)
- Datenschutzrichtlinie (1)
- Datenspeicherung (1)
- Dependenzgrammatik (1)
- Deutsches Referenzkorpus (1)
- Deutsches Textarchiv (1)
- Deutschland (1)
- Deutschland. Bundesministerium für Bildung und Forschung (1)
- Digital Humanities Studium (1)
- Digitale Forschungsdaten (1)
- Digitale Geisteswissenschaften (1)
- Digitale Lehre (1)
- Digitale Werkzeuge (1)
- Digitales Wörterbuch der deutschen Sprache (1)
- DocBook (1)
- Dokumentenverarbeitung (1)
- E-Learning (1)
- EFNIL (1)
- EOSC (1)
- Eigentum (1)
- Einführung (1)
- Empirische Linguistik (1)
- Entwicklung (1)
- European Federation of National Institutions for Language (1)
- FML (1)
- Forschungseinrichtung (1)
- Forschungsfinanzierung (1)
- Forschungsimpact (1)
- Forschungsinfrastrukturen (1)
- Forschungsprojekt (1)
- Forschungsverbund (1)
- Freiheit (1)
- Fremdsprache (1)
- Fremdsprachenlernen (1)
- Fédération Européenne des Institutions Linguistiques Nationales (1)
- GDPR (1)
- Geistiges Eigentum (1)
- German language (1)
- Geschichte (1)
- Geschichtswissenschaft (1)
- Gleichheit (1)
- Grammatiktheorie (1)
- HPSG (1)
- Head-driven phrase structure grammar (1)
- Higher Education (1)
- Historische Lexikografie (1)
- IDS (1)
- ISO (1)
- IT infrastructure (1)
- Impact-Indikatoren (1)
- Information Extraction (1)
- Informationsgesellschaft (1)
- Informationsintegration (1)
- Informationssystem (1)
- Informationsverarbeitung (1)
- Informationsversorgung (1)
- Informationswissenschaft (1)
- Infrastrukturplanung (1)
- Innovation (1)
- Interdisziplinarität (1)
- Internet (1)
- Internetbasierte Kommunikation (1)
- Interrelated document grammars (1)
- Kongress (1)
- Kontrollierte Sprache (1)
- KorAP (1)
- KorAP (Korpusanalyseplattform der nächsten Generation) (1)
- Korpusanalyse (1)
- Korpusmanagement (1)
- Korpustechnologie (1)
- Kulturwissenschaften (1)
- Language (1)
- Language resources (1)
- Language technology (1)
- Langzeitarchierung (1)
- Lehre (1)
- Leibniz-WissenschaftsCampus Mannheim/Heidelberg (1)
- Lemma (1)
- Linguistikstudium (1)
- Linguistische Informationswissenschaft (1)
- Literaturwissenschaft (1)
- Lizenzvergabe (1)
- Long-Term Archiving (1)
- Markup Languages (1)
- Markup Languages & Programming (1)
- Maschinelle Übersetzung (1)
- Mehrsprachigkeit (1)
- Methode (1)
- Migration (1)
- Modellierung (1)
- Morphology (1)
- Multilingual corpus (1)
- Multimodalität (1)
- Multiple annotations (1)
- Namespaces (1)
- National corpus (1)
- Nationale Forschungsdateninfrastruktur (NFDI) (1)
- Natural language processing (1)
- Nutzungsrecht (1)
- Online-Ressource (1)
- Ontologie <Wissensverarbeitung> (1)
- Ontology (1)
- Open Source (1)
- Organisation (1)
- Ortsverteilt (1)
- Part-of-speech tagging (1)
- Phrasenstrukturgrammatik (1)
- Preservation (1)
- Privacy by Design (1)
- Privatsphäre (1)
- Processing (1)
- Programmiersprache (1)
- Prolog (1)
- Pronomen (1)
- Query Languages (1)
- Query Rewriting (1)
- Querying (1)
- Rechtsfrage (1)
- Research infrastructure (1)
- Research infrastructures (1)
- Ressourcen (1)
- Rezeption (1)
- SSH (1)
- Sammlungen (1)
- Schema Languages (1)
- Schulbuch (1)
- Semantic Analysis (1)
- Semantic Web (1)
- Semantik (1)
- Semantische Analyse (1)
- Semantische Relation (1)
- Semasiologie (1)
- Service provider (1)
- Social sciences and humanities (1)
- Speech Corpora (1)
- Speech Lexica (1)
- Spoken Language Data (1)
- Sprachentwicklung (1)
- Sprachtechnologie (1)
- Sprachtypologie (1)
- Sprachvariante (1)
- Sprachverfall (1)
- Sprachwandel (1)
- Sprachwissenschaft (1)
- Staatssprache (1)
- Strukturbaum (1)
- Studium (1)
- Sustainability (1)
- Syntaktische Analyse (1)
- Syntax (1)
- TEI (1)
- TIB (1)
- Technische Informationsbibliothek (TIB) (1)
- Technische Infrastruktur (1)
- Technologie (1)
- Terminologie (1)
- Testproduktion (1)
- Text (1)
- Text Encoding Initiative (TEI) (1)
- Text Mining (1)
- Text data (1)
- TextTransfer (1)
- Textanalyse (1)
- Textplus (1)
- Textproduktion (1)
- Textverarbeitung (1)
- Topikalisierung (1)
- Transfer-Potenzial (1)
- Transkription (1)
- Transparenz (1)
- Treebank (1)
- Unifikationsgrammatik (1)
- Universität Bielefeld (1)
- Unterricht (1)
- Virtuelle Forschungsumgebungen (1)
- Virtuelle Hochschule (1)
- Visualisierung (1)
- WebLicht (1)
- Wikipedia (1)
- Wissenschaftliche Kooperation (1)
- Wissensgraph (1)
- Wissenspräsentation (1)
- Wissensrepräsentation (1)
- Wissenstransfer (1)
- Wissensverarbeitung (1)
- Wohin damit? Storing and reusing my language data (1)
- Wörterbuch (1)
- XQuery (1)
- XSLT (1)
- Zweitsprache (1)
- búsqueda (1)
- category detection (1)
- code of ethics (1)
- collections (1)
- computer-mediated communication (1)
- concept systems (1)
- copyright (1)
- corpus (1)
- corpus analysis tools (1)
- corpus linguistics (1)
- corpus technology (1)
- data depositing (1)
- dictionary encoding (1)
- digital research infrastructure (1)
- disambiguation (1)
- ethics (1)
- historical encyclopedias (1)
- impact (1)
- impact categories (1)
- information infrastructure (1)
- innovation (1)
- interactive graph visualisation (1)
- international language (1)
- interoperability (1)
- language documentation (1)
- large corpus data (1)
- law (1)
- lexicography (1)
- liability (1)
- linguistic diversity (1)
- machine learning (1)
- mantenimiento (1)
- markup language (1)
- microservices (1)
- multidisciplinarity (1)
- natural language processing (1)
- network analysis (1)
- official language (1)
- onomasiological model (1)
- open science (1)
- personal data (1)
- privacy (1)
- recursos (1)
- reference corpora (1)
- reference corpus (1)
- repository (1)
- research data management (1)
- research reports (1)
- scalability (1)
- search engine (1)
- semasiological model (1)
- service interoperability (1)
- sostenibilidad (1)
- spelling reform (1)
- standardisation (1)
- standards (TEI/TMF/LMF) (1)
- standoff annotation (1)
- tentative taxonomy (1)
- terminology (1)
- terminology visualisation (1)
- virtual collections (1)
- web-based information system (1)
- word sense alignment (1)
- Öffentlichkeit (1)
Publicationstate
- Veröffentlichungsversion (99)
- Zweitveröffentlichung (23)
- Postprint (12)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (16)
- de Gruyter (8)
- Extreme Markup Languages Conference (6)
- Springer (6)
- European language resources association (ELRA) (4)
- Oxford University Press (4)
- Universität Bielefeld (4)
- European Language Resources Association (3)
- Institut für Deutsche Sprache (3)
- Narr (3)
This paper presents the application of the <tiger2/> format to various linguistic scenarios with the aim of making it the standard serialisation for the ISO 24615 [1] (SynAF) standard. After outlining the main characteristics of both the SynAF metamodel and the <tiger2/> format, as extended from the initial Tiger XML format [2], we show through a range of different language families how <tiger2/> covers a variety of constituency and dependency based analyses.
XML has been designed for creating structured documents, but the information that is encoded in these structures are, by definition, out of scope for XML. Additional sources, normally not easily interpretable by computers, such as documentation are needed to determine the intention of specific tags in a tag-set. The Component Metadata Infrastructure (CMDI) takes a rather pragmatic approach to foster interoperability between XML instances in the domain of metadata descriptions for language resources. This paper gives an overview of this approach.
We present SPLICR, the Web-based Sustainability Platform for Linguistic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one’s spe cific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collec- tions of language resources.
We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.
This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. The initiative is a cooperation between three collaborative research centres in Germany – the SFB 441 “Linguistic Data Structures” in Tübingen, the SFB 538 “Multilingualism” in Hamburg, and the SFB 632 “Information Structure” in Potsdam/Berlin. The aim of the project is to develop methods for sustainable archiving of the diverse bodies of linguistic data used at the three sites. In the first half of the paper, the data handling solutions developed so far at the three centres are briefly introduced. This is followed by an assessment of their commonalities and differences and of what these entail for the work of the new joint initiative. The second part then sketches seven areas of open questions with respect to sustainable data handling and gives a more detailed account of two of them – integration of linguistic terminologies and development of best practice guidelines.
Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society
(2020)
This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.
The present submission reports on a pilot project conducted at the Institute for the German Language (IDS), aiming at strengthening the connection between ISO TC37SC4 “Language Resource Management” and the CLARIN infrastructure. In terminology management, attempts have recently been made to use graph-theoretical analyses to get a better understanding of the structure of terminology resources. The project described here aims at applying some of these methods to potentially incomplete concept fields produced over years by numerous researchers serving as experts and editors of ISO standards. The main results of the project are twofold. On the one hand, they comprise concept networks dynamically generated from a relational database and browsable by the user. On the other, the project has yielded significant qualitative feedback that will be offered to ISO. We provide the institutional context of this endeavour, its theoretical background, and an overview of data preparation and tools used. Finally, we discuss the results and illustrate some of them.
Die durch die Covid-19-Pandemie bedingte Umstellung der Präsenzlehre auf digitale Lehr- und Lernformate stellte Lehrende und Studierende gleichermaßen vor eine Herausforderung. Innerhalb kürzester Zeit musste die Nutzung von Plattformen und digitalen Tools erlernt und getestet werden. Der Beitrag stellt exemplarisch Dienste und Werkzeuge von CLARIAH-DE vor und erläutert, wie die digitale Forschungsinfrastruktur Lehrende und Studierende auch im Rahmen der digitalen Lehre unterstützen kann.
Co-reference annotation and resources: a multilingual corpus of typologically diverse languages
(2002)
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process.
This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling of co-referential phenomena. Current corpus based approaches to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general models of co-reference miss input from dialogue data of non-European languages. We aim to fill this gap and contribute to a model of co-reference on various language-specific and language-general levels.
This paper proposes a methodology for querying linguistic data represented in different corpus formats. Examples of the need for queries over such heterogeneous resources are the corpus-based analysis of multimodal phenomena like the interaction of gestures and prosodic features, or syntax-related phenomena like information structure which exceed the expressive power of a tree-centered corpus format. Query languages (QLs) currently under development are strongly connected to corpus formats, like the NITE Object Model (NOM, Carletta et al., 2003) or the Meta-Annotation Infrastructure for ATLAS (MAIA, Laprun and Fiscus, 2002). The parallel development of linguistic query languages and corpus formats is due to the fact that general purpose query languages like XQuery (Boag et al., 2003) do not fulfill the changing needs of linguistically motivated queries, e.g. to give access to (non-)hierarchically organized, theory and language dependent annotations of multi modal signals and/or text. This leads to the problem that existing corpus formats and query languages are hard to reuse. They have to be re developed and re-implemented time-consumingly and expensively for unforeseen tasks. This paper describes an approach for overcoming these problems and a sample application.
The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.