Refine
Year of publication
Document Type
- Conference Proceeding (12)
- Article (7)
- Part of a Book (3)
Has Fulltext
- yes (22)
Keywords
- Korpus <Linguistik> (9)
- Computerlinguistik (7)
- Forschungsdaten (6)
- Erzählforschung (4)
- Handlungsstruktur <Literatur> (4)
- Metadatenmodell (4)
- Datenmanagement (3)
- Deutsch (3)
- Infrastruktur (3)
- Metadaten (3)
Publicationstate
- Zweitveröffentlichung (11)
- Veröffentlichungsversion (10)
- Postprint (2)
Reviewstate
- Peer-Review (22) (remove)
Publisher
- CLARIN (3)
- Linköping University Electronic Press (3)
- Dagstuhl (2)
- Universitätsverlag Rhein-Ruhr OHG (2)
- Zenodo (2)
- de Gruyter (2)
- Erich Schmidt (1)
- European Language Resources Association (1)
- Lang (1)
- Oxford University Press (1)
The understanding of story variation, whether motivated by cultural currents or other factors, is important for applications of formal models of narrative such as story generation or story retrieval. We present the first stage of an experiment to elicit natural narrative variation data suitable for evaluation with respect to story similarity, to qualitative and quantitative analysis of story variation, and also for data processing. We also present few preliminary results from the first stage of the experiment, using Red Riding Hood and Romeo and Juliet as base texts.
Accentuation, Uncertainty and Exhaustivity - Towards a Model of Pragmatic Focus Interpretation
(2010)
This paper presents a model of pragmatic focus interpretation that is assumed to be part of a complete language comprehension model and that is inspired by Levelt's language processing model. The model is derived from our empirical data on the role of accentuation, prosodic indicators of uncertainty and context for pragmatic focus interpretation. In its present state, the model is restricted to these data, but nevertheless generates predictions.
This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).
We continue the study of the reproducibility of Propp’s annotations from Bod et al. (2012). We present four experiments in which test subjects were taught Propp’s annotation system; we conclude that Propp’s system needs a significant amount of training, but that with sufficient time investment, it can be reliably trained for simple tales.
We present web services which implement a workflow for transcripts of spoken language following the TEI guidelines, in particular ISO 24624:2016 “Language resource management – Transcription of spoken language”. The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
We present web services implementing a workflow for transcripts of spoken language following TEI guidelines, in particular ISO 24624:2016 "Language resource management - Transcription of spoken language". The web services are available at our website and will be available via the CLARIN infrastructure, including the Virtual Language Observatory and WebLicht.
CMDI Explorer
(2021)
We present CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
Die diesjährige Jahrestagung des Leibniz-Instituts für Deutsche Sprache in Mannheim mit dem Titel „Deutsch in Europa“ zielte auf eine Perspektivenerweiterung ab. In zwölf Fachvorträgen, neun Projektvorstellungen im Rahmen einer Methodenmesse und einer Podiumsdiskussion wurden sprachpolitische, grammatische und methodische Aspekte des sprachlichen Nebeneinanders in Europa, des Sprachvergleichs und des Deutscherwerbs diskutiert.
A general concept of perspective is proposed, using the mathematical notion of vector spaces as metaphor. The concept is applied to different phenomena which use perspective: spatiotemporal perspective, lexical semantics (prototypes and features), perspectivation in syntax and lexical inferences. Criteria for constructing a superordinate perspective for two given perspectives are developed.
We present a technique called event mapping that allows to project text representations into event lists, produce an event table, and derive quantitative conclusions to compare the text representations. The main application of the technique is the case where two classes of text representations have been collected in two different settings (e.g., as annotations in two different formal frameworks) and we can compare the two classes with respect to their systematic differences in the event table. We illustrate how the technique works by applying it to data collected in two experiments (one using annotations in Vladimir Propp’s framework, the other using natural language summaries).
FnhdC/HTML und FnhdC/S
(2007)
In unserem Beitrag diskutieren wir Aspekte einer Forschungsdateninfrastruktur für den wissenschaftlichen Alltag auf Projektebene und argumentieren für eine Unterstützung von Projekten während der Erfassung und Bearbeitung von Daten, d. h. vor deren endgültiger Veröffentlichung. Dabei differenzieren wir zwischen Projekten, deren primäres Ziel es ist, eine Ressource aufzubauen (ressourcenschaffende Projekte, kurz RP) und solchen, die zur Beantwortung einer konkreten Forschungsfrage Daten sammeln und auswerten (Forschungsprojekte, kurz FP). Wir argumentieren dafür, dass bei den offenkundigen Unterschieden zwischen beiden Projektarten die grundsätzlichen Ansprüche an das alltägliche Forschungsdatenmanagement im Kern sehr ähnlich (wenn auch unterschiedlich akzentuiert und skaliert) sind. Diese Ähnlichkeit rührt nicht zuletzt daher, dass im Rahmen von FP gesammelte Daten in Bezug auf das Projektziel primär Mittel zum Zweck sein mögen, sie jedoch bereits im Arbeitsprozess in unterschiedlichem Maß von unterschiedlichen Beteiligten genutzt werden. Wir gehen konkret auf die Aspekte Datenorganisation und -verwaltung, Metadaten, Dokumentation und Dateiformate und deren Anforderungen in den verschiedenen Projekttypen ein. Schließlich diskutieren wir Lösungsansätze dafür, Aspekte des Forschungsdatenmanagements auch in (kleineren) Forschungsprojekten nicht post-hoc, sondern bereits in der Projektplanung als Teil der alltäglichen Arbeit zu berücksichtigen und entsprechende Unterstützung in der Forschungsinfrastruktur vorzusehen.
In unserem Beitrag diskutieren wir Aspekte einer Forschungsdateninfrastruktur für den wissenschaftlichen Alltag auf Projektebene und argumentieren für eine Unterstützung von Projekten während der Erfassung und Bearbeitung von Daten, d. h. vor deren endgültiger Veröffentlichung. Dabei differenzieren wir zwischen Projekten, deren primäres Ziel es ist, eine Ressource aufzubauen (ressourcenschaffende Projekte, kurz RP) und solchen, die zur Beantwortung einer konkreten Forschungsfrage Daten sammeln und auswerten (Forschungsprojekte, kurz FP). Wir argumentieren dafür, dass bei den offenkundigen Unterschieden zwischen beiden Projektarten die grundsätzlichen Ansprüche an das alltägliche Forschungsdatenmanagement im Kern sehr ähnlich (wenn auch unterschiedlich akzentuiert und skaliert) sind. Diese Ähnlichkeit rührt nicht zuletzt daher, dass im Rahmen von FP gesammelte Daten in Bezug auf das Projektziel primär Mittel zum Zweck sein mögen, sie jedoch bereits im Arbeitsprozess in unterschiedlichem Maß von unterschiedlichen Beteiligten genutzt werden. Wir gehen konkret auf die Aspekte Datenorganisation und -verwaltung, Metadaten, Dokumentation und Dateiformate und deren Anforderungen in den verschiedenen Projekttypen ein. Schließlich diskutieren wir Lösungsansätze dafür, Aspekte des Forschungsdatenmanagements auch in (kleineren) Forschungsprojekten nicht post-hoc, sondern bereits in der Projektplanung als Teil der alltäglichen Arbeit zu berücksichtigen und entsprechende Unterstützung in der Forschungsinfrastruktur vorzusehen.
This article evaluates the terminological component (TC) of the grammis portal on German grammar developed by the Institut für Deutsche Sprache. The TC is included into grammis to facilitate nonlinguists‘ access to the main components of the portal: Grammar in questions and answers, and the Systematic Grammar. The TC thus has the potential to be an extremely useful and important grammis component. We discuss to what extend the TC achieves its goals, and make some suggestions how it could be improved. The most important aspects considered in the evaluation are: (a) TC completeness and consistency, (b) accessibility and usability of definitions and index, (c) integration of the TC with the overall system.
This article investigates the use of überhaupt and sowieso in German and Dutch. These two words are frequently classified as particles, if only because of their pragmatic functions. The frequent use of particles is considered a specific trait common to German and Dutch, and the description of their semantics and pragmatics is notoriously difficult. It is unclear whether both particles have the same meaning in Dutch (where they are loanwords) and German, whether they can fulfil the same syntactic functions and to what extent the (semantic and pragmatic) functions of überhaupt und sowieso overlap. There has already been linguistic research on überhaupt and sowieso by Fisseni (2009) using the world-wide web and by Bruijnen and Sudhoff (2013) using the EUROPARL corpus. In the present study we critically evaluated the corpus study, integrating information on original utterance language and discussing the adequacy of this corpus. Moreover, we conducted an experimental survey collecting subjective-intuitive judgements in three dimensions, thus gathering more data on sparse and informal constructions.
By using these complementary methods, we obtain a more nuanced picture of the use of überhaupt and sowieso in both languages: On the one hand, the data show where the use of both words is more similar and on the other hand, differences between the languages can also be discerned.
A formal narrative representation is a procedure assigning a formal description to a natural language narrative. One of the goals of the computational models of narrative community is to understand this procedure better in order to automatize it. A formal framework fit for automatization should allow for objective and reproducible representations. In this paper, we present empirical work focussing on objectivity and reproducibility of the formal framework by Vladimir Propp (1928). The experiments consider Propp’s formalization of Russian fairy tales and formalizations done by test subjects in the same formal framework; the data show that some features of Propp’s system such as the assignment of the characters to the dramatis personae and some of the functions are not easy to reproduce.
Signposts for CLARIN
(2020)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold et al. 2020 present Signposts as a solution to challenges in long-term preservation of corpora, especially corpora that are continuously extended and subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution Signposts can make to the CLARIN infrastructure and document the design for the CMDI profile.
Signposts for CLARIN
(2021)
An implementation of CMDI-based signposts and its use is presented in this paper. Arnold, Fisseni et al. (2020) present signposts as a solution to challenges in long-term preservation of corpora. Though applicable to digital resources in general, we focus on corpora, especially those that are continuously extended or subject to modification, e.g., due to legal injunctions, but also may overlap with respect to constituents, and may be subject to migrations to new data formats. We describe the contribution signposts can make to the CLARIN infrastructure, notably virtual collections, and document the design for the CMDI profile.
Interested in formally modelling similarity between narratives, we investigate judgements of similarity between narratives in a small corpus of film reviews and book–film comparisons. A main finding is that judgements tend to concern multiple levels of story representation at once. As these texts are pragmatically related to reception contexts, we find many references to reception quality and optimality. We conclude that current formal models of narrative can not capture the task of naturalistic narrative comparisons given in the analysed reviews, but that the development of models containing a more reception-oriented point of view will be necessary.
The CMDI Explorer
(2020)
We present the CMDI Explorer, a tool that empowers users to easily explore the contents of complex CMDI records and to process selected parts of them with little effort. The tool allows users, for instance, to analyse virtual collections represented by CMDI records, and to send collection items to other CLARIN services such as the Switchboard for subsequent processing. The CMDI Explorer hence adds functionality that many users felt was lacking from the CLARIN tool space.
We present an experimental approach to determining natural dimensions of story comparison. The results show that untrained test subjects generally do not privilege structural information. When asked to justify sameness ratings, they may refer to content, but when asked to state differences, they mostly refer to style, concrete events, details and motifs. We conclude that adequate formal models of narratives must represent such non-structural data.
We compare the use of überhaupt and sowieso in Dutch and German. We use the world-wide web as the main resource and pursue a zigzag strategy, trying to find usages going back and forth between dictionaries, intuitions and real data obtained through web search. To our surprise, the results more or less confirm the decision of Dutch dictionaries to consider überhaupt and sowieso synonymous. In German, we find no synonymy, but only a great overlap of usage conditions in declarative sentences.