Refine
Year of publication
- 2020 (101) (remove)
Document Type
- Article (38)
- Conference Proceeding (26)
- Part of a Book (25)
- Book (4)
- Part of Periodical (3)
- Doctoral Thesis (2)
- Other (2)
- Working Paper (1)
Language
- English (101) (remove)
Keywords
- Korpus <Linguistik> (36)
- Forschungsdaten (21)
- Gesprochene Sprache (15)
- Deutsch (14)
- Computerlinguistik (13)
- Konversationsanalyse (13)
- Datenmanagement (8)
- Interaktion (8)
- Natürliche Sprache (8)
- German (7)
Publicationstate
- Veröffentlichungsversion (57)
- Zweitveröffentlichung (34)
- Postprint (14)
- Ahead of Print (2)
Reviewstate
Publisher
- European Language Resources Association (19)
- CLARIN (6)
- Association for Computational Linguistics (4)
- Benjamins (4)
- Dictionary Society of North America (3)
- Linköping University Electronic Press (3)
- MDPI (3)
- Routledge (3)
- SAGE (3)
- De Gruyter (2)
Despite the importance of the agent role for language grammar and processing, its definition and features are still controversially discussed in the literature on semantic roles. Moreover, diagnostic tests to dissociate agentive from non-agentive roles are typically applied with qualitative introspection data. We investigated whether quantitative acceptability ratings obtained with a well-established agentivity test, the DO-cleft, provide evidence for the feature-based prototype account of (Dowty, David R. 1991. Thematic protoroles and argument selction. Language 67(3). 547-619) postulating that agentivity increases with the number of agentive features that a role subsumes. We used four different intransitive verb classes in German and collected acceptability judgements from non-expert native speakers of German. Our results show that sentence acceptability increases linearly with the number of agentive features and, hence, agentivity. Moreover, our findings confirm that sentience belongs to the group of proto-agent features. In summary, this suggests that a multidimensional account including a specific mechanism for role prototypicality (feature accumulation) successfully captures gradient acceptability clines. Quantitative acceptability estimates are a meaningful addition to linguistic theorizing.
N-grams are of utmost importance for modern linguistics and language theory. The legal status of n-grams, however, raises many practical questions. Traditionally, text snippets are considered copyrightable if they meet the originality criterion, but no clear indicators as to the minimum length of original snippets exist; moreover, the solutions adopted in some EU Member States (the paper cites German and French law as examples) are considerably different. Furthermore, recent developments in EU law (the CJEU's Pelham decision and the new right of newspaper publishers) also provide interesting arguments in this debate. The proposed paper presents the existing approaches to the legal protection of n-grams and tries to formulate some clear guidelines as to the length of n-grams that can be freely used and shared.
Having the necessary skills for staying in contact with friends and relatives through digital devices is crucial in today’s world. As the current COVID-19 pandemic shows, this holds especially true for the elderly. Being quarantined and restricted from physically meeting people, various communication technologies are more important than ever for staying social and informed on current events. In nursing homes, staff members are now finding new ways for staying in touch with family members by assisting residents in making video calls with mobile devices.
But what if elderly people cannot rely on personal assistance for accessing these alternative means of communication? This raises the general question of how older people can and do learn to use such technologies. Although the internet is full of guides and instructional videos on how to use smartphones or tablets, they are a cold comfort to someone who may not even know what an internet browser is.
Especially for digital newcomers, the tried and true method of face-to-face instruction is invaluable. While many older people turn to their children or grandchildren for help in all things digital, courses specifically tailored for elderly users are also increasingly popular.
More and more governmental initiatives and associations indeed acknowledge the already existing interest of elderly citizens in digital tools and their growing need to receive customized training (e.g. “SeniorSurf” and “Kansalaisen digitaidot” in Finland or “Silver Tipps” in Germany). For a researcher of social interaction, these courses can also provide a valuable window for discovering what it looks and sounds like to learn to use essential but sometimes alien technologies.
In our paper, we present a case study on the quality of concept relations in the manually developed terminological resource of grammis, an information system on German grammar. We assess a SKOS representation of the resource using the tool qSKOS, create a typology of the issues identified by the tool, and conduct a qualitative analysis of selected cases. We identify and discuss aspects that can motivate quality issues and uncover that ill-formed relations are frequently indicative of deeper issues in the data model. Finally, we outline how these findings can inform improvements in our resource’s data model, discussing implications for the machine readability of terminological data.
As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially accessible for systematic querying. Automatic Speech Recognition (ASR) is one option of making that data accessible. This paper tests the usability of a state-of-the-art ASR-System on a historical (from the 1960s), but regionally balanced corpus of spoken German, and a relatively new corpus (from 2012) recorded in a narrow area. We observed a regional bias of the ASR-System with higher recognition scores for the north of Germany vs. lower scores for the south. A detailed analysis of the narrow region data revealed – despite relatively high ASR-confidence – some specific word errors due to a lack of regional adaptation. These findings need to be considered in decisions on further data processing and the curation of corpora, e.g. correcting transcripts or transcribing from scratch. Such geography-dependent analyses can also have the potential for ASR-development to make targeted data selection for training/adaptation and to increase the sensitivity towards varieties of pluricentric languages.
In this article, we describe a user support solution for the digital humanities. As a case study, we show the development of the CLARIN-D Helpdesk from 2013 into the current support solution that has been extended for several other CLARIN-related software and projects and the DARIAH-ERIC. Furthermore, we describe a way towards a common support platform for CLARIAH-DE, which is currently in the final phase. We hope to further expand the help desk in the following years in order to act as a hub for user support and a central knowledge resource for the digital humanities not only in the German, but also in the European area and perhaps at some point worldwide.