Refine
Document Type
- Conference Proceeding (13) (remove)
Has Fulltext
- yes (13)
Keywords
- Korpus <Linguistik> (7)
- Forschungsdaten (6)
- Metadatenmodell (4)
- Metadaten (3)
- syllable prominence (3)
- Dateiformat (2)
- Datenformat (2)
- Datenmanagement (2)
- Infrastruktur (2)
- Langzeitarchivierung (2)
Publicationstate
- Veröffentlichungsversion (13) (remove)
Reviewstate
- Peer-Review (13)
In our study we use the experimental framework of priming to manipulate our subjects’ expectations of syllable prominence in sentences with a well-defined syntactic and phonological structure. It shows that it is possible to prime prominence patterns and that priming leads to significant differences in the judgment of syllable prominence.
The CMDI Explorer
(2020)
We present the CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. The CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
Signposts for CLARIN
(2020)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold et al. 2020 present Signposts as a solution to challenges in long-term preservation of corpora, especially corpora that are continuously extended and subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution Signposts can make to the CLARIN infrastructure and document the design for the CMDI profile.
Signposts for CLARIN
(2021)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold, Fisseni et al. (2020) present signposts as a solution to challenges in long-term preservation of corpora. Though applicable to digital resources in general, we focus on corpora, especially those that are continuously extended or subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution signposts can make to the CLARIN infrastructure, notably virtual collections, and document the design for the CMDI profile.
The instructions under which raters quantify syllable prominence perception need to be simple in order to maintain immediate reactions. This leads to noise in the rating data that can be dealt with by normalization, e.g. setting central tendency = 0 and dispersion = 1 (as in Z-score normalization). Questions arise such as: Which parameter is adequate here to capture central tendency? Which reference distribution should the normalization be based on? In this paper 16 different normalization methods are evaluated. In a perception experiment using German read speech (prose and poetry), syllable prominence ratings were collected. From the rating data 16 complete “mirror” data-sets were computed according to the 16 methods. Each mirror data-set was correlated with the same set of measures from the underlying acoustic data, focusing on raw syllable duration which is seen as a rather straightforward acoustic aspect of syllable prominence. Correlation coefficients could be raised considerably by selected methods.
The perception of syllable prominence depends to a limited extent on the acoustic properties of the speech signal in question. Psychoacoustic factors are involved as well. Thus, research often relies on two types of data: subjective prominence ratings collected in perception experiments and acoustic measures. A problem with the rating data is noise resulting from individual approaches to the rating task. This paper addresses the question of how this noise can be reduced by normalization, evaluating 12 normalization methods. In a perception experiment, prominence ratings concerning German read speech were collected. From the raw rating data 12 different ‘mirror’ data-sets were computed according to the 12 methods. Each mirror data-set was correlated with the same set of underlying acoustic data. The multiple regression setup included raw syllable duration as well as within-syllable maximum F0 and intensity. Adjusted r2-values could beraised considerably with selected methods.
In diesem Beitrag widmen wir uns der Frage, welche Schritte unternommen werden müssen, um Skripte, die bei der Aufbereitung und/oder Auswertung von Forschungsdaten Anwendung finden, so FAIR wie möglich zu gestalten. Dabei nehmen wir sowohl Reproduzierbarkeit, also den Weg von den (Roh)daten zu den Ergebnissen einer Studie, als auch Wiederverwertbarkeit, also die Möglichkeit, die Methoden einer Studie mittels des Skripts auf andere Daten anzuwenden, in den Fokus und beleuchten dabei die folgenden Aspekte: Arbeitsumgebung, Datenvalidierung, Modularisierung, Dokumentation und Lizenz.
In unserem Beitrag diskutieren wir Aspekte einer Forschungsdateninfrastruktur für den wissenschaftlichen Alltag auf Projektebene und argumentieren für eine Unterstützung von Projekten während der Erfassung und Bearbeitung von Daten, d. h. vor deren endgültiger Veröffentlichung. Dabei differenzieren wir zwischen Projekten, deren primäres Ziel es ist, eine Ressource aufzubauen (ressourcenschaffende Projekte, kurz RP) und solchen, die zur Beantwortung einer konkreten Forschungsfrage Daten sammeln und auswerten (Forschungsprojekte, kurz FP). Wir argumentieren dafür, dass bei den offenkundigen Unterschieden zwischen beiden Projektarten die grundsätzlichen Ansprüche an das alltägliche Forschungsdatenmanagement im Kern sehr ähnlich (wenn auch unterschiedlich akzentuiert und skaliert) sind. Diese Ähnlichkeit rührt nicht zuletzt daher, dass im Rahmen von FP gesammelte Daten in Bezug auf das Projektziel primär Mittel zum Zweck sein mögen, sie jedoch bereits im Arbeitsprozess in unterschiedlichem Maß von unterschiedlichen Beteiligten genutzt werden. Wir gehen konkret auf die Aspekte Datenorganisation und -verwaltung, Metadaten, Dokumentation und Dateiformate und deren Anforderungen in den verschiedenen Projekttypen ein. Schließlich diskutieren wir Lösungsansätze dafür, Aspekte des Forschungsdatenmanagements auch in (kleineren) Forschungsprojekten nicht post-hoc, sondern bereits in der Projektplanung als Teil der alltäglichen Arbeit zu berücksichtigen und entsprechende Unterstützung in der Forschungsinfrastruktur vorzusehen.
Streefkerk defines prominence as the perceptually outstanding parts in spoken language. An optimal rating scale for syllable prominence has not been found yet. This paper evaluates a 4-point, an 11-point, a 31-point, and a continuous scale for the rating of syllable prominence and gives support for scales using a higher number of levels. Priming effects found by Arnold, et al., could only be replicated using the 31-point scale.
Prominence has been widely studied on the word level and the syllable level. An extensive study comparing the two approaches is missing in the literature. This study investigates how word and syllable prominence relate to each other in German. We find that perceptual ratings based on the word level are more extreme than those based on the syllable level. The correlations between word prominence and acoustic features are greater than the correlations between syllable prominence and acoustic features.
CMDI Explorer
(2021)
We present CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.
This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).