Refine
Year of publication
Document Type
- Conference Proceeding (7)
- Article (3)
- Working Paper (3)
- Other (1)
Has Fulltext
- yes (14)
Keywords
- Deutsch (7)
- Korpus <Linguistik> (5)
- Annotation (2)
- Gesprochene Sprache (2)
- Interaktionsanalyse (2)
- Internet (2)
- 6. Arbeitstreffen deutschsprachiger Akademiewörterbücher (1)
- Adverb (1)
- Anonymisierung (1)
- Archiv für Gesprochenes Deutsch (AGD) (1)
Publicationstate
- Veröffentlichungsversion (12)
- Postprint (1)
- Zweitveröffentlichung (1)
Reviewstate
- Review-Status-unbekannt (14) (remove)
Publisher
- Universität Zürich (2)
- Bergische Universität-GHS Wuppertal, Fachbereich 4 (1)
- Berlin-Brandenburgische Akademie der Wissenschaften; Zentrum Sprache (1)
- ISCA (1)
- Institute of Cybernetics, Institute of the Estonian Language (1)
- KM Kulturmanagement Network GmbH (1)
- Linköping University Electronic Press, Linköpings universitet (1)
- The Association for Computational Linguistics (1)
- University of Birmingham (1)
- University of Texas (1)
We present the annotation of information structure in the MULI project. To learn more about the information structuring means in prosody, syntax and discourse, theory- independent features were defined for each level. We describe the features and illustrate them on an example sentence. To investigate the interplay of features, the representation has to allow for inspecting all three layers at the same time. This is realised by a stand-off XML mark-up with the word as the basic unit. The theory-neutral XML stand-off annotation allows integrating this resource with other linguistic resources such as the Tiger Treebank for German or the Penn treebank for English.
Tagset und Richtlinie für das PoSTagging von Sprachdaten aus Genres internetbasierter Kommunikation
(2015)
The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol. In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that Koral- Query is built on, we exemplify the representation of corpus queries in the serialized format and illustrate use cases in the KorAP project.
The naturalness of synthetic speech depends strongly on the prediction of appropriate prosody. For the present study the original annotation of the German speech database “Kiel Corpus of Read Speech” was extended automatically with syntactic features, word frequency, and syllable boundaries. Several classification and regression trees for predicting symbolic prosody features, postlexical phonological processes, duration, and F0 were trained on this database. The perceptual evaluation showed that the overall perceptual quality of the German text-to-speech system MARY can be significantly improved by training all models that contribute to prosody prediction on the same database. Furthermore, it showed that the error introduced by symbolic prosody prediction perceptually equals the error produced by a direct method that does not exploit any symbolic prosody features.
In this study we investigate the intonational characteristics of the four utterance types statement, wh-question, yes/no-question and declarative question. Readings of two German scripted dialogues were examined to ascertain characteristic features of the F0 contour for each utterance type. Final boundary tone, nuclear pitch accent, F0 offset, F0 onset, F0 range, and the slopes of a topline and a bottomline were determined for each utterance and compared for the four utterance types. Results show that for an average speaker, the final boundary tone, the F0 range, and the slope of the topline can be used to distinguish between the four utterance types. However, speakers may deviate from this pattern and exploit other intonational means to distinguish certain utterance types or choose not to mark a syntactic difference at all.
We present an XML-based metadata standard for the documentation of speech and multimedia corpora that was developed at the Institute for German Language (IDS) in Mannheim, Germany. The IDS is one of the major institutions providing German speech and language corpora to researchers. These corpora stem from many different sources and were previously documented in a rather heterogeneous fashion using a variety of data models and formats. In order to unify the documentation for existing and future corpora, the IDS- internal Archive for Spoken German collaborated with several projects and developed a set of standardised XML metadata schemas. These XML schemas build on existing internal and external documentation schemas (such as IMDI) and take into account the workflow of speech corpus production. In order to minimise redundancy, separate schemas were designed for projects, speakers, recording sessions, and entire corpora. The resulting schemas are tested in ongoing speech and multi-media projects at the IDS and are regularly revised. They are accompanied by element definitions, guidelines, and examples. In addition, a mapping to IMDI will be provided.
Wie selbstbestimmt können wir das Internet nutzen? Wie viel wissen wir darüber,welche digitalen Spuren wir setzen und wer diesen hinterher spürt?
Wie werden die beim Surfen erzeugten Daten von Dritten weiter verwendet – mit und ohne unser Wissen? Und ist die gefühlte Nacktheit in Zeiten der digital ausspähbaren, scheinbaren Transparenz wirklich akut oder durch traditionelle analoge Denk- und Erfahrungsstrukturen geprägt?
Ein «Alpha-Gottesdienst» ist ein Gottesdienst «mit dem etwas anderen Programm», bei dem «Neugierige und Suchende nicht nur Predigt und Gebet, sondern auch Anspiele und Interviews sowie jede Menge Livemusik» erleben können. Die Autoren wollen im vorliegenden Beitrag in Form einer Fallstudie den Beginn eines solchen «Alphagottesdienstes» analysieren, weil er für den Zusammenhang von Interaktionsarchitektur, Sozialtopografie und Interaktionsraum hoch aufschlussreich erscheint. Naturgemäß muss bei einer solchen Analyse auch die Struktur des ausgewählten Falles gebührend zur Sprache kommen, d. h. im vorliegenden Fall die Struktur eines gottesdienstlichen Geschehens, dessen Bedeutung weitgehend vom Kontrast zu einem unterstellten Normalfall von Gottesdienst lebt («nicht nur Predigt und Gebet») und der sich ausdrücklich an ein nicht bereits im Glauben eingerichtetes Publikum routinierter Gottesdienstbesucher, sondern an «Neugierige und Suchende» wendet.
Centering on German self-motion verbs, this paper demonstrates the advantages of free-sorting over creating and delineating word fields with more traditional methods. In particular, I draw a comparison to Snell-Hornby’s (1983) work on German descriptive verbs, which produces lexical fields with the help of dictionary entries, a thesaurus, a small corpus of written text and limited speaker feedback. While these methods have benefits, they are limited in their ability to represent the average organization of semantic fields in the mind of everyday speakers. Freesorting, by contrast, does not rely on academic resources, corpora or singular speaker judgments. In sorting, a group of informants creates visible sets of items according to perceived similarity. Psycholinguists have used the method to quantitatively explore the perception of color terms across cultures (c.f. Roberson et al. 2005). With a sufficiently large number of informants, one can generate lexical sorting data that is apt for cluster analysis, the results of which are represented by dendrograms. The experiment I conducted involved 33 school children from a middle class neighborhood in Braunschweig, Northern Germany. My experiment shows that Snell-Hornby’s (1983) representation of the self-motion field can be improved by integrating further dimensions of meaning, such as body-space relations and sound, that young speakers find salient in the grouping procedure.