Informatik
Refine
Document Type
- Article (7)
- Working Paper (2)
- Part of a Book (1)
- Report (1)
Keywords
- Standardisierung (7)
- Korpuslinguistik (4)
- Computerlinguistik (3)
- Deutsch (3)
- Evaluation (2)
- Forschungsdaten (2)
- Infrastruktur (2)
- Bericht (1)
- Computerunterstützte Lexikographie (1)
- Datenmanagement (1)
Publicationstate
Publisher
- Zenodo (2)
- De Gruyter Mouton (1)
This White Paper sets out commonly agreed definitions on activities of consortia within NFDI. It aims to provide a common basis for reporting and reference regarding selected questions of cross-consortial relevance in DFG’s template for the Interim Reports. The questions were prioritised by an NFDI Task Force on Evaluation and Reporting (formerly Task Force Monitoring) as a result of discussing possible answers to the DFG template. In this process the need to agree on a generalizable meaning of terms commonly used in the context of NFDI, and reporting in particular, were identified from cross-consortial perspectives. Questions that showed the highest requirement on clarification are discussed in this White Paper. As NFDI evolves, the Task Force will likely propose further joint approaches for reporting in information infrastructures.
While each of broad relevance, the questions addressed relate to substantially different aspects of consortia’s work. They are thus also structured slightly different.
In der Bund-Länder-Vereinbarung (BLV) zu Aufbau und Förderung einer Nationalen Forschungsdateninfrastruktur (NFDI) (im Folgenden BLV-NFDI) wird in §1 festgehalten, dass mit der Förderung "eine Etablierung und Fortentwicklung eines übergreifenden Forschungsdatenmanagements" und damit eine "Steigerung der Effizienz des gesamten Wissenschaftssystems verfolgt" wird. In der BLV-NFDI werden dazu sieben Ziele vorgegeben, die eine Verfeinerung dieser Hauptziele darstellen. Dieses White Paper formuliert das gemeinsame Verständnis der beteiligten Konsortien für die sieben in der BLV-NFDI vorgegebenen Ziele. Auf der Grundlage dieses Verständnisses hat die Task Force Evaluation und Reporting Vorschläge gemacht, wie das Erreichen der Ziele erfasst, beschrieben und gemessen werden kann.
The present contribution addresses an infrastructural issue of universal relevance, addressed in the specific context of the TEI. We describe a combination of open-source tools and an open-access approach to creating knowledge repositories that have been employed in building a bibliographic reference library for the “TEI for Linguists” special interest group (LingSIG). The authors argue that, for an initiative such as the TEI, it is important to choose open, freely available solutions. If these solutions have the advantage of attracting new users and promoting the initiative itself, so much the better, especially if it is done in a non-committal way: no one using the LingSIG bibliographic repository has to be a member of the LingSIG or a “TEI-er” in general.
The paper presents an XML schema for the representation of genres of computer-mediated communication (CMC) that is compliant with the encoding framework defined by the TEI. It was designed for the annotation of CMC documents in the project Deutsches Referenzkorpus zur internetbasierten Kommunikation (DeRiK), which aims at building a corpus on language use in the most popular CMC genres on the German-speaking Internet. The focus of the schema is on those CMC genres which are written and dialogic―such as forums, bulletin boards, chats, instant messaging, wiki and weblog discussions, microblogging on Twitter, and conversation on “social network” sites.
The schema provides a representation format for the main structural features of CMC discourse as well as elements for the annotation of those units regarded as “typical” for language use on the Internet. The schema introduces an element <posting>, which describes stretches of text that are sent to the server by a user at a certain point in time. Postings are the main constituting elements of threads and logfiles, which, in our schema, are the two main types of CMC macrostructures. For the microlevel of CMC documents (that is, the structure of the <posting> content), the schema introduces elements for selected features of Internet jargon such as emoticons, interaction words and addressing terms. It allows for easy anonymization of CMC data for purposes in which the annotated data are made publicly available and includes metadata which are necessary for referencing random excerpts from the data as references in dictionary entries or as results of corpus queries.
Documentation of the schema as well as encoding examples can be retrieved from the web at http://www.empirikom.net/bin/view/Themen/CmcTEI. The schema is meant to be a core model for representing CMC that can be modified and extended by others according to their own specific perspectives on CMC data. It could be a first step towards an integration of features for the representation of CMC genres into a future new version of the TEI Guidelines.
Although most of the relevant dictionary productions of the recent past have relied on digital data and methods, there is little consensus on formats and standards. The Institute for Corpus Linguistics and Text Technology (ICLTT) of the Austrian Academy of Sciences has been conducting a number of varied lexicographic projects, both digitising print dictionaries and working on the creation of genuinely digital lexicographic data. This data was designed to serve varying purposes: machine-readability was only one. A second goal was interoperability with digital NLP tools. To achieve this end, a uniform encoding system applicable across all the projects was developed. The paper describes the constraints imposed on the content models of the various elements of the TEI dictionary module and provides arguments in favour of TEI P5 as an encoding system not only being used to represent digitised print dictionaries but also for NLP purposes.
This paper describes work in progress on I5, a TEI-based document grammar for the corpus holdings of the Institut für Deutsche Sprache (IDS) in Mannheim and the text model used by IDS in its work. The paper begins with background information on the nature and purposes of the corpora collected at IDS and the motivation for the I5 project (section 1). It continues with a description of the origin and history of the IDS text model (section 2), and a description (section 3) of the techniques used to automate, as far as possible, the preparation of the ODD file documenting the IDS text model. It ends with some concluding remarks (section 4). A survey of the additional features of the IDS-XCES realization of the IDS text model is given in an appendix.
This article presents an approach that supports the creation of personal learning environments (PLE) suitable for self-regulated learning (SRL). PLEs became very popular in recent years offering more personal freedom to learners than traditional learning environments. However, creating and configuring PLEs demand specific meta-skills that not all learners have. This situation leads to the challenge how learners can be supported to create PLEs that are useful to achieve their intended learning outcomes. The theory of SRL describes learners as self-regulated if they are capable of taking over control of the own learning process. Grounding on that theory, a model has been elaborated that offers guidance for the creation of PLEs containing tools for cognitive and meta-cognitive learning activities. The implementation of this approach has been done in the context of the ROLE infrastructure. A quantitative and qualitative evaluation with teachers describes advantages and ideas for improvement.