Refine
Document Type
- Part of a Book (2)
- Conference Proceeding (1)
- Working Paper (1)
Has Fulltext
- yes (4) (remove)
Is part of the Bibliography
- yes (4) (remove)
Keywords
- Korpus <Linguistik> (3)
- Annotation (1)
- Computerunterstützte Kommunikation (1)
- Forschungsdaten (1)
- Gesprochene Sprache (1)
- Information Retrieval (1)
- Infrastruktur (1)
- Internet (1)
- Lexikalische Analyse (1)
- Rechtsstatus (1)
Publicationstate
- Veröffentlichungsversion (4) (remove)
Reviewstate
- (Verlags)-Lektorat (2)
- Peer-Review (1)
- Peer-review (1)
Publisher
The landscape of digital lexical resources is often characterized by dedicated local portals and proprietary interfaces as primary access points for scholars and the interested public. In addition, legal and technical restrictions are potential issues that can make it difficult to efficiently query and use these valuable resources. As part of the research data consortium Text+, solutions for the storage and provision of digital language resources are being developed and provided in the context of the unified cross-domain German research data infrastructure NFDI. The specific topic of accessing lexical resources in a diverse and heterogenous landscape with a variety of participating institutions and established technical solutions is met with the development of the federated search and query framework LexFCS. The LexFCS extends the established CLARIN Federated Content Search that already allows accessing spatially distributed text corpora using a common specification of technical interfaces, data formats, and query languages. This paper describes the current state of development of the LexFCS, gives an insight into its technical details, and provides an outlook on its future development.
The paper presents best practices and results from projects dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC) from four different countries. Even though there are still many open issues related to building and annotating corpora of this type, there already exists a range of tested solutions which may serve as a starting point for a comprehensive discussion on how future standards for CMC corpora could (and should) be shaped like.
This article reports about the on-going work on a new version of the metadata framework Component Metadata Infrastructure (CMDI), central to the CLARIN infrastructure. Version 1.2 introduces a number of important changes based on the experience gathered in the last five years of intensive use of CMDI by the digital humanities community, addressing problems encountered, but also introducing new functionality. Next to the consolidation of the structure of the model and schema sanity, new means for lifecycle management have been introduced aimed at combatting the observed proliferation of components, new mechanism for use of external vocabularies will contribute to more consistent use of controlled values and cues for tools will allow improved presentation of the metadata records to the human users. The feature set has been frozen and approved, and the infrastructure is now entering a transition phase, in which all the tools and data need to be migrated to the new version.