Refine
Year of publication
Document Type
- Part of a Book (581)
- Conference Proceeding (561)
- Article (453)
- Book (66)
- Working Paper (26)
- Doctoral Thesis (21)
- Other (18)
- Part of Periodical (12)
- Preprint (12)
- Contribution to a Periodical (6)
Language
- English (1765) (remove)
Keywords
- Korpus <Linguistik> (416)
- Deutsch (410)
- Computerlinguistik (161)
- Konversationsanalyse (138)
- Interaktion (116)
- Englisch (112)
- Annotation (97)
- Gesprochene Sprache (93)
- Automatische Sprachanalyse (75)
- Wörterbuch (73)
Publicationstate
- Veröffentlichungsversion (961)
- Zweitveröffentlichung (248)
- Postprint (236)
- Ahead of Print (6)
- Preprint (5)
- Erstveröffentlichung (2)
Reviewstate
- Peer-Review (828)
- (Verlags)-Lektorat (410)
- Peer-review (24)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (18)
- Verlags-Lektorat (14)
- Peer-Revied (8)
- Review-Status-unbekannt (6)
- Abschlussarbeit (Bachelor, Master, Diplom, Magister) (Bachelor, Master, Diss.) (3)
- (Verlags-)Lektorat (2)
- Peer review (2)
Publisher
- de Gruyter (104)
- Benjamins (87)
- IDS-Verlag (81)
- Springer (63)
- European Language Resources Association (ELRA) (56)
- Association for Computational Linguistics (46)
- European Language Resources Association (42)
- Oxford University Press (35)
- Elsevier (33)
- Institut für Deutsche Sprache (33)
This paper consists of a short analysis of the sources and the treatment of the legal lexicon in the first dictionary published by the Spanish Royal Academy (1726–1739), followed by a longer commentary on the representation and the treatment of the concept of judge, in which the reflection of the extralinguistic factors in the definitions stands in focus. The results highlight the relevance of the legal context of that era for the treatment of the lexicon related to the legal domain, but they also demonstrate the pattern in which the lexicographic data displays peculiarities of legal matters.
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.
The ISOcat registry reloaded
(2012)
The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data categories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry’s content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community’s terminological metadata repertoire.
This study investigated whether an analysis of narrative style (word use and cross-clausal syntax) of patients with symptoms of generalised anxiety and depression disorders can help predict the likelihood of successful participation in guided self-help. Texts by 97 people who had made contact with a primary care mental health service were analysed. Outcome measures were completion of the guided self-help programme, and change in symptoms assessed by a standardised scale (CORE-OM). Regression analyses indicated that some aspects of participants' syntax helped to predict completion of the programme, and that aspects of syntax and word use helped to predict improvement of symptoms. Participants using non-finite complement clauses with above-average frequency were four times more likely to complete the programme (95% confidence interval 1.4 to 11.7) than other participants. Among those who completed, the use of causation words and complex syntax (adverbial clauses) predicted improvement, accounting for 50% of the variation in well-being benefit. These results suggest that the analysis of narrative style can provide useful information for assessing the likelihood of success of individuals participating in a mental health guided self-help programme.
In informal interaction, speakers rarely thank a person who has complied with a request. Examining data from British English, German, Italian, Polish, and Telugu, we ask when speakers do thank after compliance. The results show that thanking treats the other’s assistance as going beyond what could be taken for granted in the circumstances. Coupled with the rareness of thanking after requests, this suggests that cooperation is to a great extent governed by expectations of helpfulness, which can be long-standing, or built over the course of a particular interaction. The higher frequency of thanking in some languages (such as English or Italian) suggests that cultures differ in the importance they place on recognizing the other’s agency in doing as requested.
In their analysis of methods that participants use to manage the realization of practical courses of action, Kendrick and Drew (2016/this issue) focus on cases of assistance, where the need to be addressed is Self’s, and Other lends a helping hand. In our commentary, we point to other forms of cooperative engagement that are ubiquitously recruited in interaction. Imperative requests characteristically expect compliance on the grounds of Other’s already established commitment to a wider and shared course of actions. Established commitments can also provide the engine behind recruitment sequences that proceed nonverbally. And forms of cooperative engagement that are well glossed as assistance can nevertheless be demonstrably oriented to established commitments. In sum, we find commitment to shared courses of action to be an important element in the design and progression of certain recruitment sequences, where the involvement of Other is best defined as contribution. The commentary highlights the importance of interdependent orientations in the organization of cooperation. Data are in German, Italian, and Polish.
Drawing on research from conversation analysis and developmental psychology, we point to the existence of “supporters” of morally responsible agency in everyday interaction: causes of our behavior that we are often unaware of, but that would make goodenough reasons for our actions, were we made aware of them.
How to propose an action as an objective necessity. The case of Polish trzeba x (‘one needs to x’)
(2011)
The present study demonstrates that language-specific grammatical resources can afford speakers language-specific ways of organizing cooperative practical action. On the basis of video recordings of Polish families in their homes, we describe action affordances of the Polish impersonal modal declarative construction trzeba x (“one needs to x”) in the accomplishment of everyday domestic activities, such as cutting bread, bringing recalcitrant children back to the dinner table, or making phone calls. Trzeba-x turns in first position are regularly chosen by speakers to point to a possible action as an evident necessity for the furthering of some broader ongoing activity. Such turns in first position provide an environment in which recipients can enact shared responsibility by actively involving themselves in the relevant action. Treating the necessity as not restricted to any particular subject, aligning responsive actions are oriented to when the relevant action will be done, not whether it will be done. We show that such sequences are absent from English interactions by analyzing (a) grammatically similar turn formats in English interaction (“we need to x,” “the x needs to y”), and (b) similar interactive environments in English interactions. We discuss the potential of this research to point to a new avenue for researchers interested in the relationship between language diversity and diversity in human action and cognition.
The authors compare the use of two formats for requesting an object in informal everyday interaction: imperatives, common in our Polish data, and second-person polar questions, common in our English data. Imperatives and polar questions are selected in the same interactional “home environments” across the languages, in which they enact two social actions: drawing on shared responsibility and enlisting assistance, respectively. Speakers across the languages differ in their choice of request format in “mixed” interactional environments that support either. The finding shed light on the orderly ways in which cultural diversity is grounded in invariants of action formation.
Sometimes in interaction, a speaker articulates an overt interpretation of prior talk. Such moments have been studied as involving the repair of a problem with the other’s talk or as formulating an understanding of the matter at hand. Stepping back from the established notions of formulations and repair, we examine the variety of actions speakers do with the practice of offering an interpretation, and the order within this domain. Results show half a dozen usage types of interpretations in mundane interaction. These form a largely continuous territory of action, with recognizably distinct usage types as well as cases falling between these (proto)typical uses. We locate order in the domain of interpretations using the method of semantic maps and show that, contrary to earlier assumptions in the literature, interpretations that formulate an understanding of the matter at hand are actually quite pervasive in ordinary talk. These findings contribute to research on action formation and advance our understanding of understanding in interaction. Data are video- and audio-recordings of mundane social interaction in the German language from a variety of settings.
The present paper explores how rules are enforced and talked about in everyday life. Drawing on a corpus of board game recordings across European languages, we identify a sequential and praxeological context for rule talk. After a game rule is breached, a participant enforces proper play and then formulates a rule with an impersonal deontic statement (e.g. “It’s not allowed to do this”). Impersonal deontic statements express what may or may not be done without tying the obligation to a particular individual. Our analysis shows that such statements are used as part of multi-unit and multi-modal turns where rule talk is accomplished through both grammatical and embodied means. Impersonal deontic statements serve multiple interactional goals: they account for having changed another’s behavior in the moment and at the same time impart knowledge for the future. We refer to this complex action as an “instruction.” The results of this study advance our understanding of rules and rule-following in everyday life, and of how resources of language and the body are combined to enforce and formulate rules.
We examine moments in social interaction in which a person formulates what another thinks or believes. Such formulations of belief constitute a practice with specifiable contexts and consequences. Belief formulations treat aspects of the other person's prior conduct as accountable on the basis that it provided a new angle on a topic, or otherwise made a surprising contribution within an ongoing course of actions. The practice of belief formulations subjectivizes the content that the other articulated and thereby topicalizes it, mobilizing commitment to that position, an account, or further elaboration. We describe how the practice can be put to work in different activity contexts: sometimes it is designed to undermine the other's position as a subjective 'mere belief', at other times it serves to mobilize further topic talk. Throughout, belief formulations show themselves to be a method by which we get to know ourselves and each other as mental agents.
Discourse metaphors
(2008)
The article introduces the notion of discourse metaphor, relatively stable metaphorical mappings that function as a key framing device within a particular discourse over a certain period of time. Discourse metaphors are illustrated by case studies from three lines of research: on the cultural imprint of metaphors, on the negotiation of metaphors and on cross-linguistic occurrence. The source concepts of discourse metaphors refer to phenomenologically salient real or fictitious objects that are part of interactional space (i.e., can be pointed at, like MACHINES or HOUSES) and/or occupy an important place in cultural imagination. Discourse metaphors change both over time and across the discourses where they are used. The implications of focussing on different types of source domains for our thinking about the embodiment and sociocultural situatedness of metaphor is discussed, with particular reference to recent developments in Conceptual Metaphor Theory. Research on discourse suggests that situatedness is a crucial factor in the functioning and dynamics of metaphor.
In the management of cooperation, the fit of a requested action with what the addressee is presently doing is a pervasively relevant consideration. We present evidence that imperative turns are adapted to, and reflexively create, contexts in which the other person is committed to the course of action advanced by the imperative. This evidence comes from systematic variation in the design of imperative turns, relative to the fittedness of the imperatively mandated action to the addressee’s ongoing trajectory of actions, what we call the “dine of commitment”. We present four points on this dine: Responsive imperatives perform an operation on the deontic dimension of what the addressee has announced or already begun to do (in particular its permissibility); local-project imperatives formulate a new action advancing a course of action in which the addressee is already actively engaged; global-project-imperatives target a next task for which the addressee is available on the grounds of their participation in the overall event, and in the absence of any competing work; and competitive imperatives draw on a presently otherwise engaged addressee on the grounds of their social commitment to the relevant course of actions. These four turn shapes are increasingly complex, reflecting the interactional work required to bridge the increasing distance between what the addressee is currently doing, and what the imperative mandates. We present data from German and Polish informal and institutional settings.
Linguistic relativists have traditionally asked 'how language influences thought', but conversation analysts and anthropological linguists have moved the focus from thought to social action. We argue that 'social action' should in this context not become simply a new dependent variable, because the formulation 'does language influence action' suggests that social action would already be meaningfully constituted prior to its local (verbal and multi-modal) accomplishment. We draw on work by the gestalt psychologist Karl Duncker to show that close attention to action-in-a-situation helps us ground empirical work on cross-cultural diversity in an appreciation of the invariances that make culture-specific elements of practice meaningful.
The article discusses the possibilities and challenges of combining conversation analysis and ethnography in the study of everyday family life. We argue that such a combination requires the decision whether to prioritise interaction data or ethno-graphic (in particular, interview) data in the analysis. We present a conversation analytic case study of how household work is commonly brought up in the interactions of one couple and bring this to bear on a re-analysis of a possible conflict situation originally described in the ethnographic analysis by Klein, Izquierdo, and Bradbury (2007), published in this journal. While the findings of the two analyses converge, they inform us about different dimensions of couple interaction. The ethnographic analysis is focused on participants’ experiences, and the conversation analysis is focused on participants’ practices. We conclude that the methodological decision to prioritise interaction or interview data has consequences for the kind of questions we can ask.
Psychological research has emphasized the importance of narrative for a person’s sense of self. Building a coherent narrative of past events is one objective of psychotherapy. However, in guided self-help therapy the patient has to develop this narrative autonomously. Identifying patients’ narrative skills in relation to psychological distress could provide useful information about their suitability for self-help. The aim of this study was to explore whether the syntactic integration of clauses into narrative in texts written by prospective psychotherapy patients was related to mild to moderate psychological distress. Cross-clausal syntax of texts by 97 people who had contacted a primary care mental health service was analyzed. Severity of symptoms associated with mental health difficulties was assessed by a standardized scale (Clinical Outcomes in Routine Evaluation outcome measure). Cross-clausal syntactic integration was negatively correlated with the severity of symptoms. A multiple regression analysis confirmed that the use of simple sentences, finite complement clauses, and coordinated clauses was associated with symptoms (R2 = .26). The results suggest that the analysis of cross-clausal syntax can provide information on patients’ narrative skills in relation to distressing events and can therefore provide additional information to support treatment decisions.
This article makes an empirical and a methodological contribution to the comparative study of action. The empirical contribution is a comparative study of three distinct types of action regularly accomplished with the turn format du meinst x (“you mean/think x”) in German: candidate understandings, formulations of the other’s mind, and requests for a judgment. These empirical materials are the basis for a methodological exploration of different levels of researcher abstraction in the comparative study of action. Two levels are examined: the (coarser) level of conditionally relevant responses (what a response speaker must do to align with the action of the prior turn) and the (finer) level of “full alignment” (what a response speaker can do to align with the action of a prior turn). Both levels of abstraction provide empirically viable and analytically interesting descriptive concepts for the comparative study of action. Data are in German.
This article makes an empirical and a methodological contribution to the comparative study of action. The empirical contribution is a comparative study of three distinct types of action regularly accomplished with the turn format du meinst x (“you mean/think x”) in German: candidate understandings, formulations of the other’s mind, and requests for a judgment. These empirical materials are the basis for a methodological exploration of different levels of researcher abstraction in the comparative study of action. Two levels are examined: the (coarser) level of conditionally relevant responses (what a response speaker must do to align with the action of the prior turn) and the (finer) level of “full alignment” (what a response speaker can do to align with the action of a prior turn). Both levels of abstraction provide empirically viable and analytically interesting descriptive concepts for the comparative study of action. Data are in German.
This chapter describes the resources that speakers of Polish use when recruiting assistance and collaboration from others in everyday social interaction. The chapter draws on data from video recordings of informal conversation in Polish, and reports language-specific findings generated within a large-scale comparative project involving eight languages from five continents (see other chapters of this volume). The resources for recruitment described in this chapter include linguistic structures from across the levels of grammatical organization, as well as gestural and other visible and contextual resources of relevance to the interpretation of action in interaction. The presentation of categories of recruitment, and elements of recruitment sequences, follows the coding scheme used in the comparative project (see Chapter 2 of the volume). This chapter extends our knowledge of the structure and usage of Polish with detailed attention to the properties of sequential structure in conversational interaction. The chapter is a contribution to an emerging field of pragmatic typology.
Cognitive linguists have long been interested in analogies people habitually use in thinking and speaking, but little is known about the nature of the relationship between verbal behaviour and such analogical schemas. This article proposes that discourse metaphors are an important link between the two. Discourse metaphors are verbal expressions containing a construction that evokes an analogy negotiated in the discourse community. Results of an analysis of metaphors in a corpus of newspaper texts support the prediction that regular analogies are form-specific, i.e., bound to particular lexical items. Implications of these results for assumptions about the generality of habitual analogies are discussed.
This article explores the role that metaphors play in the ideological interpretation of events. Research in cognitive linguistics has brought rich evidence of the enormous influence that body experience has on (metaphorical) conceptualization. However, the role of the cultural net in which an individual is embedded has mostly been neglected. As a step towards the integration of cultural experience into the experientialist framework in cognitive metaphor research I propose to differentiate two ideal types of motivation for metaphor: correlation and intertextuality. Evidence for the important role that intertextual metaphors play in ideological discourse comes from an analysis of Polish newspaper discourse on the tenth anniversary of the end of communism.
This article discusses possibilities for an elaboration of cognitive linguistic metaphor theory that takes into account the sociocultural situatedness of language and cognition. The approach of the Ethnolinguistic School of Lublin, linking anthropological with cognitive perspectives on language, is introduced. The objectives of the article are i) to introduce this line of research, well-known in linguistics in Eastern Europe, but little known in the “Western”, English speaking scientific discourse; ii) to illustrate the usefulness of particular ideas within this approach for metaphor analysis in a corpus study of the metaphorical understanding of system transformation in German public discourse in the late 1980s and early 1990s; and iii) to discuss diverging elaborations of the notion of experience in cognitive linguistics, contrasting the Ethnolinguistic School of Lublin with Conceptual Metaphor Theory.
When formulating a request for an object, speakers can choose among different grammatical resources that would all serve the overall purpose. This paper examines the social contexts indexed and created by the choice of the turn format can I have x to request a shared good (the pepper grinder, a tissue from a box on the table, etc.) in British English informal interaction. The analysis is based on a video corpus of approximately 25 h of everyday interaction among family and friends. In its home environment, a request in the format can I have x treats the other as being in control over the relevant material object, a control that is the contingent outcome of ongoing courses of action. This contingent control over a shared good produces an obligation to make it available. This analysis is supported by an examination of similarly formatted request turns in other languages, of can I have x in another interactional environment (after a relevant offer has been made) in British English, and of deviant cases. The results highlight the intimate connection of request format selection to the present engagements of (prospective) request recipients.
This book analyses requests for action on the basis of natural video-recorded data of everyday interaction in British English and Polish families. Jorg Zinken describes in his analyses the features of interactional context that people across cultures might be sensitive to in designing a request, as well as aspects of cultural diversity.
This study analyses the use of the Polish wez- V2 (take-V2) double imperative to request here-and-now actions. The analysis is based on a collection of approximately 40 take-V2 double imperatives, which was built from a corpus of 10 hours of video recordings of everyday interactions (preparing and having meals, playing with children, etc.) taking place in the homes of Polish families. A sequential analysis of these data shows that the take-V2 construction is commonly selected in situations where the request recipient could be expected to already be attending to the relevant business (e.g., because they committed to this earlier in the interaction), but isn’t. By selecting the take-V2 format, the request speaker reanimates the recipient´s responsibility for the matter at hand.
This paper introduces a method for computer-based analyses of metaphor in discourse, combining quantitative and qualitative elements. This method is illustrated with data from research on German newspaper discourse concerning the ongoing system transformations of the late 1980s and early 1990s. Methodological aspects of the research procedure are discussed and it is argued that quantitative elements can enhance comparability in cross-cultural and cross-lingual research. Some basic findings of the research are presented. The peculiarities of the German Wende discourse - especially the salience of a passive perspective on the ongoing political and social changes - are outlined.
Conduit metaphor
(2011)
Temporal frames of reference
(2010)
‘Linguistic relativity’ has become a major keyword in debates on the psychological significance of language diversity. In this context, the term ‘relativity’ was originally taken on loan from Einstein’s then-recent theories by Edward Sapir (1924) and Benjamin L. Whorf (1940). The present paper assesses how far the idea of linguistic relativity does analogically build on relevant insights in modern physics, and fails to find any substantial analogies. The term was used rhetorically by Sapir and Whorf, and has since been incorporated into a cognitivist research programme that seeks to answer whether ‘language influences thought’. Contemporary research on ‘linguistic relativity’ has developed into a distinct way of studying language diversity, which shares a lot with the universalistic cognitivist framework it opposes, but little with relational approaches in science.
Badania etnolingwistyczne zdobyly w ciqgu ostatnich dwu dekad znaozna populamosc. Najwazniejsz^ formuh\ nietaforycznn okreslajqcii glowny przedmiot tych badaií jest .jçzykowy obraz swiata”. W zwiqzku z tym. iz powstaj^ obecnie projekty studiów komparatyslycznych na duzíi skalç, warto byt moze rozwazyc, czego takie ujçcie etnolingwistyki nie uwzglçdnia. Wizualna metafora obrazów implikuje, ze mówincy si\ w slanie wyjsc ix>za swiat i patrzec nan (oraz nazywac go) z zewmprz. Artykul oinawia dwie kcinsekwencje tej inetafory, które mog^ przysporzyc problemów. Po pierwsze, wyizolowanie jçzyka ze swiata ludzkich dzialan, którego jyzyk wszak jest czçsci^. prowadzi do przyjçcia kognitywistycznego modeln znaczenia jako oddzielnego stmmienia komunikaeji. Taki model nie pasuje do eodziennego doswiadezenia przezroczystosci jyzyka. Po drugie, wyizolowanie jçzyka z zycia sprzyja stosowaniu metod „bezczasowych” oraz studiom nad stowami wyalKtrahowanymi z sytuaeji, w której zostaly one uzyte (jesli nie wyjçtymi z kontekstu). Przyjmuj^c takie metafory i inetody, inozetny stracic z oczu znaczn^ czçsc tego, co jest istotne dla jyzyka poUx;znego — przedmiotu badan etnonauki.
‘Can’ and ‘must’-type modal verbs in the direct sanctioning of misconduct across European languages
(2023)
Deontic meanings of obligation and permissibility have mostly been studied in relation to modal verbs, even though researchers are aware that such meanings can be conveyed in other ways (consider, for example, the contributions to Nuyts/van der Auwera (eds.) 2016). This presentation reports on an ongoing project that examines deontic meaning but takes as its starting point not a type of linguistic structure but a particular kind of social moment that presumably attracts deontic talk: The management of potentially ‚unacceptable‘ or untoward actions (taking the last bread roll at breakfast, making a disallowed move during a board game, etc.). Data come from a multi-language parallel video corpus of everyday social interaction in English, German, Italian, and Polish. Here, we focus on moments in which one person sanctions another’s behavior as unacceptable. Using interactional-linguistic methods (Couper-Kuhlen/Selting 2018), we examine similarities and differences across these four languages in the use of modal verbs as part of such sanctioning attempts. First results suggest that modal verbs are not as common in the sanctioning of misconduct as one might expect. Across the four languages, only between 10%–20% of relevant sequences involve a modal verb. Most of the time, in this context, speakers achieve deontic meaning in other ways (e.g., infinitives such as German nicht so schmatzen, ‚no smacking‘). This raises the question what exactly modal verbs, on those relatively rare occasions when they are used, contribute to the accomplishment of deontic meaning. The reported study pursues this question in two ways: 1) By considering similarities across languages in the ways that modal verbs interact with other (verbal) means in the sanctioning of misconduct.; 2) By considering differences across languages in the use of modal verbs. Here, we find that the relevant modal verbs are used similarly in some activity contexts (enforcing rules during board games), but less so in other activity contexts (mundane situations with no codified rules). In sum, the presented study adds to cross-linguistically grounded knowledge about deontic meaning and its relationships to linguistics structures.
This article presents preliminary results indicating that speakers have a different pitch range when they speak a foreign language compared to the pitch variation that occurs when they speak their native language. To this end, a learner corpus with French and German speakers was analyzed. Results suggest that speakers indeed produce a smaller pitch range in the respective L2. This is true for both groups of native speakers. A possible explanation for this finding is that speakers are less confident in their productions, therefore, they concentrate more on segments and words and subsequently refrain from realizing pitch range more native-like. For language teaching, the results suggest that learners should be trained extensively on the more pronounced use of pitch in the foreign language.
This study examines the pitch profiles of French learners of German and German learners of French, both in their native language (L1), and in their respective foreign language (L2). Results of the analysis of 84 speakers suggest that for short read sentences, French and German speakers do not show pitch range differences in their native production. Furthermore, analyses of mean f0 and pitch range indicate that range is not necessarily reduced in L2 productions. These results are different from results reported in prior research. Possible reasons for these differences are discussed.
A constructicon, i.e., a structured inventory of constructions, essentially aims at documenting functions of lexical and grammatical constructions. Among other parameters, so-called constructional collo-profiles, as introduced by Herbst (2018, 2020), are conclusive for determining constructional meanings. They provide information on how relevant individual words are for construction slots, they hint at usage preferences of constructions and serve as a helpful indicator for semantic peculiarities of constructions. However, even though collo-profiles constitute an indispensable component of constructicon entries, they pose major challengers for constructicographers: For a constructicographic enterprise it is not feasible to conduct collostructional analyses for hundreds or even thousands of constructions. In this article, we introduce a procedure based on the large language model BERT that allows to predict collo-profiles without having to extensively annotate instances of constructions in a given corpus. Specifically, by discussing the constructions X macht Y ADJP (‘x makes Y ADJ’, e.g. he drives him crazy) and N1 PREP N1 (e.g., bumper to bumper, constructions over constructions), we show how the developed automated system generates collo-profiles based on a limited number of annotated instances. Finally, we place collo-profiles alongside other dimensions of constructional meanings included in the German Constructicon.
Much language-related research in cognitive robotics appeals to usage-based models of language as proposed in cognitive linguistics and developmental psychology [1, 2] that emphasise the significance of learning, embodiment and general cognitive development for human language acquisition. Over and above these issues, however, what takes centre stage in these theories are social-cognitive skills of “intention-reading” that are seen as “primary in the language acquisition process” [1] – and also as difficult to incorporate into computational models of language acquisition. The present paper addresses these concerns: we describe work in progress on a series of experiments that take steps towards closing the gap between ‘solipsistic’ symbol grounding in individual robotic agents and socially framed embodied language acquisition in learners that attend to common ground [3] with changing interlocutors.
Speakers’ linguistic experience is for the most part experience with language as used in conversational interaction. Though highly relevant for usage-based linguistics, the study of such data is as yet often left to other frameworks such as conversation analysis and interactional linguistics (Couper-Kuhlen and Selting 2001). On the basis of a case study of salient usage patterns of the two German motion verbs kommen and gehen in spontaneous conversation, the present paper argues for a methodological integration of quantitative corpus-linguistic methods with qualitative conversation analytic approaches to further the usage-based study of conversational interaction.
Novel formats of construction-based description hold great potential for phenomena that fall through the cracks in traditional kinds of linguistic reference works. On the example of German verb argument structure constructions with a prepositional object, we demonstrate that a construction-based description of such phenomena is superior to existing lexicographic and grammaticographic treatments, but that it also poses a number of new problems. The most fundamental of these relates to the fact that construction-based analyses can be proposed on different levels of abstraction. We illustrate pertinent problems relating to the precise identification of constructional form and meaning and suggest a multi-layered descriptive format for web-based electronic reference constructica that can accommodate these challenges. Semantically, the proposed solution integrates both lumping and splitting perspectives on constructional grain size and permits users to flexibly zoom in and out on individual elements in the resource. Formally, it can capture variation in the number and marking of realised arguments as found in e.g. passives and transitivity alternations. Aspects of the theoretical controversy between Construction Grammar and Valency Theory are addressed where relevant, but our focus is on questions of description and the practical implementation of construction-based analyses in a suitable type of linguistic reference work.
Co-development of action, conceptualization and social interaction mutually scaffold and support each other within a virtuous feedback cycle in the development of human language in children. Within this framework, the purpose of this article is to bring together diverse but complementary accounts of research methods that jointly contribute to our understanding of cognitive development and in particular, language acquisition in robots. Thus, we include research pertaining to developmental robotics, cognitive science, psychology, linguistics and neuroscience, as well as practical computer science and engineering. The different studies are not at this stage all connected into a cohesive whole; rather, they are presented to illuminate the need for multiple different approaches that complement each other in the pursuit of understanding cognitive development in robots. Extensive experiments involving the humanoid robot iCub are reported, while human learning relevant to developmental robotics has also contributed useful results.
Disparate approaches are brought together via common underlying design principles. Without claiming to model human language acquisition directly, we are nonetheless inspired by analogous development in humans and consequently, our investigations include the parallel co-development of action, conceptualization and social interaction. Though these different approaches need to ultimately be integrated into a coherent, unified body of knowledge, progress is currently also being made by pursuing individual methods.
Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.
How (and when) do speakers generalise from memorised exemplars of a construction to a productive schema? The present paper presents a novel take on this issue by offering a corpus-based approach to semantic extension processes. Focusing on clusters of German ADJ N expressions involving the heavily polysemous adjective tief ‚deep’, it is shown that type frequency (a commonly used measure of productivity) needs to be relativised to distinct semantic classes within the overall usage spectrum of a given construction in order to predict the occurrence of novel types within a particular region of this spectrum. Some methodological and theoretical implications for usage-based linguistic model building are considered.
In spite of the obvious importance that is accorded to the notion grammatical construction in any approach that sees itself as a construction grammar (CxG), there is as yet no generally accepted definition of the term across different variants of the framework. In particular, there are different assumptions about which additional requirements a given structure has to meet in order to be recognized as a construction besides being a ‘form-meaning pair’. Since the choice of a particular definition will determine the range of both relevant phenomena and concrete observations to be considered in empirical research within the framework, the issue is not just a mere terminological quibble but has important methodological repercussions especially for quantitative research in areas such as corpus linguistics. The present study illustrates some problems in identifying and delimiting such patterns in naturally occurring text and presents arguments for a usage-based interpretation of the term grammatical construction.
Construction-based language models assume that grammar is meaningful and learnable from experience. Focusing on five of the most elementary argument structure constructions of English, a large-scale corpus study of child-directed speech (CDS) investigates exactly which meanings/functions are associated with these patterns in CDS, and whether they are indeed specially indicated to children by their caretakers (as suggested by previous research, cf. Goldberg, Casenhiser and Sethuraman 2004). Collostructional analysis (Stefanowitsch and Gries 2003) is employed to uncover significantly attracted verb-construction combinations, and attracted pairs are classified semantically in order to systematise the attested usage patterns of the target constructions. The results indicate that the structure of the input may aid learners in making the right generalisations about constructional usage patterns, but such scaffolding is not strictly necessary for construction learning: not all argument structure constructions are coherently semanticised to the same extent (in the sense that they designate a single schematic event type of the kind envisioned in Goldberg’s [1995] ‘scene encoding hypothesis’), and they also differ in the extent to which individual semantic subtypes predominate in learners’ input
Localism
(2017)
Research on syntactic ambiguity resolution in language comprehension has shown that subjects' processing decisions are influenced by a variety of heterogeneous factors such as e.g., syntactic complexity, semantic fit and the discourse frequency of the competing structures. The present paper investigates a further potentially relevant factor in such processes: effects of syntagmatic lexical chunking (or matching to a complex memorized prefab) whose occurrence would be predicted from usage-based assumptions about linguistic categorisation. Focusing on the widely studied so-called DO/SC-ambiguity in which a post-verbal NP is syntactically ambiguous between a direct object and the subject of an embedded clause, potentially biasing collocational chunks of the relevant type are identified in a number of corpus-linguistic pretests and then investigated in a self-paced reading experiment. The results show a significant increase in processing difficulty from a collocationally neutral over a lexically biasing to a strongly biasing condition. This suggests that syntagmatically complex and partially schematic templates of the kind envisioned in usage-based Construction Grammar may impinge on speakers' online processing decisions during sentence comprehension.
Introduction
(2008)
Localist hypothesis
(2017)
Smooth turn-taking in conversation depends in part on speakers being able to communicate their intention to hold or cede the floor. Both prosodic and gestural cues have been shown to be used in this context. We investigate the interplay of pitch movements and hand gestures at locations at which speaker change becomes relevant, comparing their use in German and Swedish. We find that there are some shared functions of prosody and gesture with regard to turn-taking in the two languages, but that these shared functions appear to be mediated by the different phonological demands on pitch in the two languages.
Looking at gestures as a means for communication, they can serve conversational participants at several levels. As co-speech gestures, they can add information to the verbally expressed content and they can serve to manage turn-taking. In order to look closer at the interplay between these resources in face-to face conversation, we annotated hand gestures, syntactic completion points and the related turn-organisation, and measured the timing of gesture strokes and their lexical/phrasal referent. In a case study on German, we observe the trend that speakers vary less in gesturelexis on- and offsets when keeping the turn after syntactic completions than at speaker changes, backchannel or other locations of a conversation. This indicates that timing properties of non-verbal cues interact with verbal cues to manage turn-taking.
High word frequency and neighborhood density contribute to the accuracy and speed of word production in English adults (e.g., Vitevitch & Sommers 2003), and characterize early words in child English (e.g., Storkel 2004). The present study investigated a speech corpus of child German (ages 2;00-3;00) to further the understanding of the influence of frequency and density on production. Results for four children suggest that, contrary to English, words produced early are not from denser neighborhoods in an adult lexicon than later words. As in English, frequent words are produced before less frequent words. Implications on theory and methodology are discussed.
Linguistic corpora have been annotated by means of SGML-based markup languages for almost 20 years. We can, very roughly, differentiate between three distinct evolutionary stages of markup technologies. (1)Originally, single SGML tree-based document instances were deemed sufficient for the representation of linguistic structures. (2) Linguists began to realize that alternatives and extensions to the traditional model are needed. Formalisms such as, for example, NITE were proposed: the NITE Object Model (NOM) consists of multi-rooted trees. (3) We are now on the threshold of the third evolutionary stage: even NITE's very flexible approach is not suited for all linguistic purposes. As some structures, such as these, cannot be modeled by multi-rooted trees, an even more flexible approach is needed in order to provide a generic annotation format that is able to represent genuinely arbitrary linguistic data structures.
This paper presents EXMARaLDA, a system for the computer-assisted creation and analysis of spoken
language corpora. The first part contains some general observations about technological and methodological requirements for doing corpus-based pragmatics. The second part explains the systems architecture and gives an overview of its most important software components a transcription editor, a corpus management tool and a corpus query tool. The last part presents some corpora which have been or are currently being compiled with the help of EXMARaLDA.
Complement clauses in German can have a lexical complementizer when they are finite, but they must not have one when they are non-finite. I will argue that this distribution follows from the referential properties of the sentential complement. According to Grimshaw, only referential categories extend to functional projections. The status marker zu in German infinitival complements can be shown to block reference. Thus, non-finite complement clauses with zu do not project a left periphery and cannot host a complementizer.
Accentuation, Uncertainty and Exhaustivity - Towards a Model of Pragmatic Focus Interpretation
(2010)
This paper presents a model of pragmatic focus interpretation that is assumed to be part of a complete language comprehension model and that is inspired by Levelt's language processing model. The model is derived from our empirical data on the role of accentuation, prosodic indicators of uncertainty and context for pragmatic focus interpretation. In its present state, the model is restricted to these data, but nevertheless generates predictions.
Many studies on dictionary use presuppose that users do indeed consult lexicographic resources. However, little is known about what users actually do when they try to solve language problems on their own. We present an observation study where learners of German were allowed to browse the web freely while correcting erroneous German sentences. In this paper, we are focusing on the multi-methodological approach of the study, especially the interplay between quantitative and qualitative approaches. In one example study, we will show how the analysis of verbal protocols, the correction task and the screen recordings can reveal the effects of intuition, language (learning) awareness, and determination on the accuracy of the corrections. In another example study, we will show how preconceived hypotheses about the problem at hand might hinder participants from arriving at the correct solution.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
Dictionaries have been part and parcel of literate societies for many centuries. They assist in communication, particularly across different languages, to aid in understanding, creating, and translating texts. Communication problems arise whenever a native speaker of one language comes into contact with a speaker of another language. At the same time, English has established itself as a lingua franca of international communication. This marked tendency gives lexicography of English a particular significance, as English dictionaries are used intensively and extensively by huge numbers of people worldwide.
We present ESDexplorer (https://owid.shinyapps.io/ESDexplorer), a browser application which allows the user to explore the data from a large European survey on dictionary use and culture. We built ESDexplorer with several target groups in mind: our cooperation partners, other researchers, and a more general public interested in the results. Also, we present in detail the architecture and technological realisation of the application and discuss some legal aspects of data protection that motivated some architectural choices.
The coronavirus pandemic may be the largest crisis the world has had to face since World War II. It does not come as a surprise that it is also having an impact on language as our primary communication tool. In this short paper, we present three inter-connected resources that are designed to capture and illustrate these effects on a subset of the German language: An RSS corpus of German-language newsfeeds (with freely available untruncated frequency lists), a continuously updated HTML page tracking the diversity of the vocabulary in the RSS corpus and a Shiny web application that enables other researchers and the broader public to explore the corpus in terms of basic frequencies.
We start by trying to answer a question that has already been asked by de Schryver et al. (2006): Do dictionary users (frequently) look up words that are frequent in a corpus. Contrary to their results, our results that are based on the analysis of log files from two different online dictionaries indicate that users indeed look up frequent words frequently. When combining frequency information from the Mannheim German Reference Corpus and information about the number of visits in the Digital Dictionary of the German Language as well as the German language edition of Wiktionary, a clear connection between corpus and look-up frequencies can be observed. In a follow-up study, we show that another important factor for the look-up frequency of a word is its temporal social relevance. To make this effect visible, we propose a de-trending method where we control both frequency effects and overall look-up trends.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource.
Neologisms, i.e., new words or meanings, are finding their way into everyday language use all the time. In the process, already existing elements of a language are recombined or linguistic material from other languages is borrowed. But are borrowed neologisms accepted similarly well by the speech community as neologisms that were formed from “native” material? We investigate this question based on neologisms in German. Building on the corresponding results of a corpus study, we test the hypothesis of whether “native” neologisms are more readily accepted than those borrowed from English. To do so, we use a psycholinguistic experimental paradigm that allows us to estimate the degree of uncertainty of the participants based on the mouse trajectories of their responses. Unexpectedly, our results suggest that the neologisms borrowed from English are accepted more frequently, more quickly, and more easily than the “native” ones. These effects, however, are restricted to people born after 1980, the so-called millenials. We propose potential explanations for this mismatch between corpus results and experimental data and argue, among other things, for a reinterpretation of previous corpus studies.
We present an empirical study addressing the question whether, and to which extent, lexicographic writing aids improve text revision results. German university students were asked to optimise two German texts using (1) no aids at all, (2) highlighted problems, or (3) highlighted problems accompanied by lexicographic resources that could be used to solve the specific problems. We found that participants from the third group corrected the largest number of problems and introduced the fewest semantic distortions during revision. Also, they reached the highest overall score and were most efficient (as measured in points per time). The second group with highlighted problems lies between the two other groups in almost every measure we analysed. We discuss these findings in the scope of intelligent writing environments, the effectiveness of writing aids in practical usage situations and teaching dictionary skills.
We present an empirical study addressing the question whether, and to which extent, lexicographic writing aids improve text revision results. German university students were asked to optimise two German texts using (1) no aids at all, (2) highlighted problems, or (3) highlighted problems accompanied by lexicographic resources that could be used to solve the specific problems. We found that participants from the third group corrected the largest number of problems and introduced the fewest semantic distortions during revision. Also, they reached the highest overall score and were most efficient (as measured in points per time). The second group with highlighted problems lies between the two other groups in almost every measure we analysed. We discuss these findings in the scope of intelligent writing environments, the effectiveness of writing aids in practical usage situations and teaching dictionary skills.
Reading corpora are text collections that are enriched with processing data. From a corpus linguist’s perspective, they can be seen as an extension of classical linguistic corpora with human language processing behavior. From a psycholinguist’s perspective, reading corpora allow to test psycholinguistic hypotheses on subsets of language and language processing as it is ‘in the wild’ – in contrast to strictly controlled language material in isolated sentences, as used in most psycholinguistic experiments. In this paper, we will investigate a relevance-based account of language processing which states that linguistic structures, that are embedded deeper syntactically, are read faster because readers allocate less attention to these structures.
The author presents a study using eye-tracking-while-reading data from participants reading German jurisdictional texts. I am particularly interested in nominalisations. It can be shown that nominalisations are read significantly longer than other nouns and that this effect is quite strong. Furthermore, the results suggest that nouns are read faster in reformulated texts. In the reformulations, nominalisations were transformed into verbal structures. Reformulations did not lead to increased processing times of verbal constructions but reformulated texts were read faster overall. Where appropriate, results are compared to a previous study of Hansen et al. (2006) using the same texts but other methodology and statistical analysis.
This replication study aims to investigate a potential bias toward addition in the German language, building upon previous findings of Winter and colleagues who identified a similar bias in English. Our results confirm a bias in word frequencies and binomial expressions, aligning with these previous findings. However, the analysis of distributional semantics based on word vectors did not yield consistent results for German. Furthermore, our study emphasizes the crucial role of selecting appropriate translational equivalents, highlighting the significance of considering language-specific factors when testing for such biases for languages other than English.
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees
(2007)
The Generalised Architecture for Sustainability (GENAU) provides a framework for the transformation of single-file, multi-layer annotations into multi-rooted trees. By employing constraints expressed in XCONCUR-CL, this procedure can be performed lossless, i.e., without losing information, especially with regard to the nesting of elements that belong to multiple annotation layers. This article describes how different types of linguistic corpora can be transformed using specialised tools, and how constraint rules can be applied to the resulting multi-rooted trees to add an additional level of validation.
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).
The Leibniz-Institute for the German Language (IDS) was established in Mannheim in 1964. Since then, it has been at the forefront of innovation in German linguistics as a hub for digital language data. This chapter presents various lessons learnt from over five decades of work by the IDS, ranging from the importance of sustainability, through its strong technical base and FAIR principles, to the IDS’ role in national and international cooperation projects and its expertise on legal and ethical issues related to language resources and language technology.
This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and concurrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the computing of a target node hierarchy for a well-formed, merged XML document. Special attention is paid to identity conflicts between element instances, for which a default solution that takes into account metarelations that hold between element types on the different annotation layers is provided. In addition, rules can be specified by a user to prescribe how identity conflicts should be solved for certain element types.
The actual or anticipated impact of research projects can be documented in scientific publications and project reports. While project reports are available at varying level of accessibility, they might be rarely used or shared outside of academia. Moreover, a connection between outcomes of actual research project and potential secondary use might not be explicated in a project report. This paper outlines two methods for classifying and extracting the impact of publicly funded research projects. The first method is concerned with identifying impact categories and assigning these categories to research projects and their reports by extension by using subject matter experts; not considering the content of research reports. This process resulted in a classification schema that we describe in this paper. With the second method which is still work in progress, impact categories are extracted from the actual text data.
In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications.
Overlap in markup occurs where some markup structures do not nest, such as where the structural division of the text into lists, sections, etc., differs from the syntactic division of the text into sentences and phrases. The Multiple Annotation solution to this problem (redundant encoding in multiple forms) has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. But it has the significant disadvantage of independence of the separate files. These multiply annotated files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) can be programmatically derived and used together for editing, for inference, or for unification of the multiply annotated documents.
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps.