Linköping Electronic Conference Proceedings
Refine
Document Type
- Conference Proceeding (18)
- Part of a Book (3)
- Image (1)
Language
- English (22)
Has Fulltext
- yes (22)
Keywords
- Korpus <Linguistik> (14)
- Computerlinguistik (9)
- Forschungsdaten (8)
- Annotation (6)
- Datenmanagement (6)
- Fremdsprachenlernen (5)
- Automatische Sprachanalyse (4)
- CLARIN (4)
- Computerunterstütztes Verfahren (4)
- Metadaten (4)
Publicationstate
Reviewstate
- Peer-Review (19)
- Peer-review (1)
123
To optimize the sharing and reuse of existing data, many funding organizations now require researchers to specify a management plan for research data. In such a plan, researchers are supposed to describe the entire life cycle of the research data they are going to produce, from data creation to formatting, interpretation, documentation, short-term storage, long-term archiving and data re-use. To support researchers with this task, we built DMPTY, a wizard that guides researchers through the essential aspects of managing data, elicits information from them, and finally, generates a document that can be further edited and linked to the original research proposal.
172
In this article, we describe a user support solution for the digital humanities. As a case study, we show the development of the CLARIN-D Helpdesk from 2013 into the current support solution that has been extended for several other CLARIN-related software and projects and the DARIAH-ERIC. Furthermore, we describe a way towards a common support platform for CLARIAH-DE, which is currently in the final phase. We hope to further expand the help desk in the following years in order to act as a hub for user support and a central knowledge resource for the digital humanities not only in the German, but also in the European area and perhaps at some point worldwide.
159
In this paper, we present WebAnno-MM, an extension of the popular web-based annotation tool WebAnno, which is designed for the linguistic annotation of transcribed spoken data with time aligned media files. Several new features have been implemented for our current use case: a novel teaching method based on pair-wise manual annotation of transcribed video data and systematic comparison of agreement between students. To enable the annotation of transcribed spoken language data, apart from technical and data model related challenges, WebAnno-MM offers an additional view to data: a (musical) score view for the inspection of parallel utterances, which is relevant for various methodological research questions regarding the analysis of interactions of spoken content.
139
Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.
152
We present a language learning application that relies on grammars to model the learning outcome. Based on this concept we can provide a powerful framework for language learning exercises with an intuitive user interface and a high reliability. Currently the application aims to augment existing language classes and support students by improving the learner attitude and the general learning outcome. Extensions beyond that scope are promising and likely to be added in the future.
190
Metadata provides important information relevant both to finding and understanding corpus data. Meaningful linguistic data requires both reasonable annotations and documentation of these annotations. This documentation is part of the metadata of a dataset. While corpus documentation has often been provided in the form of accompanying publications, machinereadable metadata, both containing the bibliographic information and documenting the corpus data, has many advantages. Metadata standards allow for the development of common tools and interfaces. In this paper I want to add a new perspective from an archive’s point of view and look at the metadata provided for four learner corpora and discuss the suitability of established standards for machine-readable metadata. I am are aware that there is ongoing work towards metadata standards for learner corpora. However, I would like to keep the discussion going and add another point of view: increasing findability and reusability of learner corpora in an archiving context.
180
The article focuses on determining responsible parties and the division of potential liability arising from sharing language data (LD) containing personal data (PD). A key issue here is to identify who has to make sure and guarantee the GDPR compliance. The authors aim to answer 1) whether an individual researcher is a controller and 2) whether sharing LD results in joint controllership or separate controllership (whether the data's transferee becomes the controller, the joint controller or the processor). The article also analyses the legal relations of parties involved in data sharing and potential liability. The final section outlines data sharing in the CLARIN context. The analysis serves as a preliminary analytical background for redesigning the CLARIN contractual framework for sharing data.
- 172 -
Providing online repositories for language resources is one of the main activities of CLARIN centres. The legal framework regarding liability of Service Providers for content uploaded by their users has recently been modified by the new Directive on Copyright in the Digital Single Market. A new category of Service Providers, Online Content-Sharing Service Providers (OCSSPs), was added. It is subject to a complex and strict framework, including the requirement to obtain licenses from rightholders for the hosted content. This paper provides the background and effect of these changes to law and aims to initiate a debate on how CLARIN repositories should navigate this new legal landscape.
180
N-grams are of utmost importance for modern linguistics and language technology. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of press publishers) also provide interesting arguments in this debate. The paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.
198
The Data Governance Act was proposed in late 2020 as part of the European Strategy for Data, and adopted on 30 May 2022 (as Regulation 2022/868). It will enter into application on 24 September 2023. The Data governance Act is a major development in the legal framework affecting CLARIN and the whole language community. With its new rules on the re-use of data held by the public sector bodies and on the provision of data sharing services, and especially its encouragement of data altruism, the Data Governance Act creates new opportunities and new challenges for CLARIN ERIC. This paper analyses the provisions of the Data Governance Act, and aims at initiating the debate on how they will impact CLARIN and the whole language community.