Refine
Year of publication
Document Type
- Conference Proceeding (387)
- Part of a Book (268)
- Article (204)
- Book (37)
- Working Paper (17)
- Doctoral Thesis (16)
- Other (14)
- Preprint (7)
- Part of Periodical (5)
- Report (2)
Language
- English (961) (remove)
Keywords
- Korpus <Linguistik> (279)
- Deutsch (209)
- Computerlinguistik (98)
- Annotation (70)
- Interaktion (65)
- Konversationsanalyse (60)
- Automatische Sprachanalyse (54)
- Gesprochene Sprache (53)
- Englisch (52)
- Wörterbuch (42)
Publicationstate
- Veröffentlichungsversion (961) (remove)
Reviewstate
Publisher
- IDS-Verlag (80)
- de Gruyter (55)
- Association for Computational Linguistics (44)
- European Language Resources Association (ELRA) (43)
- European Language Resources Association (24)
- Institut für Deutsche Sprache (24)
- Springer (19)
- Linköping University Electronic Press (15)
- The Association for Computational Linguistics (15)
- Leibniz-Institut für Deutsche Sprache (IDS) (14)
In the first part of this contribution, we will present, as a starting point for the following discussions, a simple formal language P containing one stative predicate. We will then discuss, on an intuitive level, how a treatment of predicates of change could be conceived, and how the progressive could be rendered in a formal language.
We will then give a formal definition of a language, TP1, based on P, and we will construct a semantics for TP1, which incorporates the ideas discussed.
This is a revised and translated version of my article "Die doppelte Wende - Zur Verbindung von Sprache, Sprachwissenschaft und zeitgebundener politischer Bewertung am Beispiel deutsch-deutscher Sprachdifferenzierung" which appeared in Politische Semantik - Bedeutungsanalytische und sprachkritische Beiträge zur politischen Sprachverwendung, ed. Josef Klein (Opladen: Westdeutscher Verlag, 1989), pp. 297-326. I am indepted to Colin Good, Norwich, England, for having translated the text into English.
This paper describes the lexical database tool LOLA (Linguistic-Oriented Lexical database Approach) which has been developed for the construction and maintenance of lexicons for the machine translation system LMT. First, the requirements such a tool should meet are discussed, then LMT and the lexical information it requires, and some issues concerning vocabulary acquisition are presented. Afterwards the architecture and the components of the LOLA system are described and it is shown how we tried to meet the requirements worked out earlier. Although LOLA originally has been designed and implemented for the German-English LMT prototype, it aimed from the beginning at a representation of lexical data that can be reused for other LMT or MT prototypes or even other NLP applications. A special point of discussion will therefore be the adaptability of the tool and its components as well as the reusability of the lexical data stored in the database for the lexicon development for LMT or for other applications.
In this paper we present a new approach to lexicographical design for the description of German speech act verbs. This approach is based on an action-theoretical semantic conception. The several conditions for linguistic action provide the basis for the elaboration of the central semantic features. The systematic relationship of these features is reflected in the organization of a lexical database which allows various possibilities of access to different types of lexical information.
In the following paper we shall give an outline of the semantic framework for describing speech act verbs, i. e. verbs of communication, with the practical goal of a semantical database for a (dictionary of) synonymy of German speech act verbs which enables the user not only to find a list of synonymous verbs but also enables him to gain an insight into the semantic relations between the words.
The semantic framework is based on
(i) a set of conditions for performing speech acts as the relevant domain of reference
(ii) the introduction of a notion of situation, or better type of situation
The performative as well as the descriptive use of the verbs can be reduced to their fundamental dependency on the situations in which they are used: on the one hand with regard to the possibility of the action itself, and on the other hand with regard to the possibility of their designation. For both ways of use the relevant aspects of the situation constitute the necessary conditions.
As can be shown for English data, the assimilation of the alveolar stop can result from an increased gestural overlap of the following oral closure gesture. Our experiment with German synthetic speech showed similar results. Further, it suggests that it is neccessary to complete the gestural specification of the glottal state. A voiced stop should be represented not only by an oral gesture, but by a glottal one as well.
Gaps in Word Formation
(1996)
New KARL (Knowledge Acquisition and Representation Language) allows to specify all parts of a problem-solving method (PSM). It is a formal language with a well-defined semantics and thus allows to represent PSMs precisely and unambiguously yet abstracting from implementation detail. In this paper it is shown how the language KARL has been modified and extended to New KARL to better meet the needs for the representation of PSMs. Based on a conceptual structure of PSMs new language primitives are introduced for KARL to specify such a conceptual structure and to support the configuration of methods. An important goal for this extension was to preserve three important properties of KARL: to be (i) a conceptual, (ii) a formal, and (iii) an executable language.
A topic in the field of knowledge acquisition is the reuse of components that are described at the knowledge level. Problems concern the description, indexing and retrieval of components. In our case there is the additional feature of integrating so called automated building blocks in a knowledge level description. This paper describes what knowledge level descriptions of components for reuse should look like, and proposes a way to describe assumptions and requirements that are to be made explicit. In the paper an extension of the “normal” knowledge acquisition setting is made in the direction of machine learning components.
In this paper, we analyze a dramatically aggravated conflict interaction taking place in the course of an association’s meeting in an urban community center. The interaction can be seen as the culmination point of a social conflict developing and increasing over a period of years. In this conflict, one of the crucial points of the sociocultural development in the city under study is to be seen in an exemplary way. Our analysis started with the question, why this conflict is unsolvable although the interest divergences of the opposing parties are not irreconcilable. Our analysis shows that the protagonists practice different communicative social styles. These stylistic differences however, are not the cause for misunderstandings, but the protagonists use stylistic differences and different cultural orientations as a resource for political action. Thereby a process of increasing hardening of perspective divergence emerges together with an interaction modality of drama and of the fundamental grounding of divergent views. Theoretically we are concerned with the explication of a sociolinguistic theory which includes as constitutive components the concepts of communicative social style, of perspectivation and of interaction modality. We want to show, that the analyzed type of sociocultural conflict can be explained by virtue of considering the interplay of features on these three levels.
In the last years a common notion of a Problem-Solving Method (PSM) emerged from different knowledge engineering frameworks. As a generic description of the dynamic behaviour of knowledge based systems PSMs are favored subjects of reuse. Up to now, most investigations on the reuse of PSMs focus on static features and methods as objects of reuse. By this, they ignore a lot of information of how the PSM was developed that is, in principle, entailed in the different parts of a conceptual model of a PSM.
In this paper the information of the different parts of PSMs is reconsidered from a reuse process point of view. A framework for generalized problem-solving methods is presented that describes the structure of a category of methods based on family resemblances. These generalized methods can be used to structure libraries of PSMs and - in the process of reuse - as a means to derive an incarnation, i.e. a member of its family of PSMs.
For illustrating the ideas, the approach is applied to the task rsp. problem type of parametric design.
In this study we investigate the intonational characteristics of the four utterance types statement, wh-question, yes/no-question and declarative question. Readings of two German scripted dialogues were examined to ascertain characteristic features of the F0 contour for each utterance type. Final boundary tone, nuclear pitch accent, F0 offset, F0 onset, F0 range, and the slopes of a topline and a bottomline were determined for each utterance and compared for the four utterance types. Results show that for an average speaker, the final boundary tone, the F0 range, and the slope of the topline can be used to distinguish between the four utterance types. However, speakers may deviate from this pattern and exploit other intonational means to distinguish certain utterance types or choose not to mark a syntactic difference at all.
A library of software components should be essentially more than just a juxtaposition of its items. For problem-solving methods the notion of a family is suggested as means to cluster the items and to provide partially a structure of the library. This paper especially investigates how the similar control flows of the members of such a family can be described in one framework.
This dissertation offers a qualitative analysis of verbal interactions in German television talk shows between 1989 and 1994. It investigates how Speakers of German formulate their own and others’ affiliation to national identities and social spaces. In particular, it examines classifications of place, person, and time that include group and place names as well as grammatically complex expressions, deictic pronouns and adverbs, and certain motion verbs. In addition, repair is discussed as a resource in re-formulating identities.
This work exploited coarticulation and loud speech as natural sources of perturbation in order to determine whether articulatory covariation (motor equivalent behavior) can be observed inspeech that is not artificially perturbed. Articulatory analyses of jaw and tongue movement in the production of alveolar consonants by German speakers were performed. The sibilant /s/ shows virtually no articulatory covariation under the influence of natural perturbations, whereas other alveolar consonants show more obvious compensatory behavior. Our conclusion is that an effect of natural sources of perturbation is noticable, but sounds are affected to different degrees.
The phonological word (henceforth pword) differs from lower units of the prosodic hierarchy (e.g. foot, syllable) in that its boundaries must align with morphological boundaries. While languages are claimed to differ w.r.t. the questions of whether and which word-internal constituents (e.g. stems, prefixes, suffixes, members of compounds) form a pword there is no consensus regarding the question of which diagnostics are relevant for determining pword structure. In this paper it is argued that systematic correlations between various suprasegmental properties (e.g. stress patterns, syllable structure) motivate the existence of word-internal pwords in German.
Identity effects in phonology are deviations from regular phonological form (i.e. canonical patterns) which are due to the relatedness between words. More specifically, identity effects are those deviations which have the function to enhance similarity in the surface phonological form of morphologically related words. In rule-based generative phonology the effects in question are described by means of the cycle. For example, the stress on the second syllable in cond[ɛ]nsation as opposed to the stresslessness of the second syllable in comp[ǝ]nsation is described by applying the stress rules initially to the sterns thereby yielding condénse and cómpensàte. Subsequently the stress rules are reapplied to the affixed words with the initial stress assignment (i.e. stress on the second syllable in condense, but not in compensate) leaving its mark in the output form (cf. Chomsky and Halle 1968). A second example are words like lie[p]los 'unloving' in German, which shows the effects of neutralization in coda position (i.e. only voiceless obstruents may occur in coda position) even though the obstruent should 'regularly' be syllabified in head position (i.e. bl is a wellformed syllable head in German). Here the stern is syllabified on an initial cycle, obstruent devoicing applies (i.e. lie[p]) and this structure is left intact when affixation applies (i.e. lie[p ]Ios ) (cf. Hall 1992). As a result the stern of lie[p]los is identical to the base lie[p].
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).
In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in listeners' preferences. In perception experiments using German speech synthesis, two standard duration models (Klatt rules and CART) were tested. The input to these models consisted of symbolic strings which were either derived from a database or a text-to-speech system. Results of the perception experiments show that different duration models can only be distinguished when the symbolic string is appropriate. Considering the relative importance of the symbolic representation, "post-lexical" segmental rules were investigated with the outcome that listeners differ in their preferences regarding the degree of segmental reduction. As a conclusion, before fine-tuning the duration prediction, it is important to calculate an appropriate phonological symbolic representation in order to improve timing in synthetic speech.
The effects of different forms of predication have been insightfully (and almost exclusively) studied for 'simple' cases of predication, of which the 'presentational sentence' is maybe the paradigm instantiation. It is the aim of this paper to show that thc same kind of effects as well as in fact the same kind of structures are present at embedded levels in thematically and otherwise more complex structures. Beyond presentational sentences, 'unaccusative' experiencing constructions involving a dative subject, 'double object constructions' and - to a lesser extent - spraylload constructions are discussed. For all of these, it is argued that they comprise a predication encoding the ascription of a transient temporal property to a location. On this basis, a proposal is made as to how the scope asymmetry between the two arguments involved in the colistructions can be explained. Furthermore, a proposal is made as to how what has been called 'argument shift' is motivated.
Here we will present a graphical software tool called Morph Moulder (MoMo) for teaching the formal foundations of a language with a denotation in a domain of relational typed feature structures as used in Head-Driven Phrase Structure Grammar. With MoMo, students learn the properties of totally well-typed, sort resolved relational feature structures, the use of formal languages to describe typed feature structures and the notions of constraint satisfaction and models of grammars written in a formal language. MoMo was realized and conceived within the context of a set of courses in the format of web-based training, that focuses on the concept of typed feature structures in a curriculum in grammar formalisms and parsing. The formal language of MoMo amends the constraint language of TRALE (an implementation platform for HPSG grammars based on ALE) to accommodate the expressive power of HPSG.
Co-reference annotation and resources: a multilingual corpus of typologically diverse languages
(2002)
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Examples from the corpus show how this methodology is used in the workflow of the annotation process.
According to a widespread conception, quantitative linguistics will eventually be able to explain empirical quantitative findings (such as Zipf’s Law) by deriving them from highly general stochastic linguistic ‘laws’ that are assumed to be part of a general theory of human language (cf. Best (1999) for a summary of possible theoretical positions). Due to their formal proximity to methods used in the so-called exact sciences, theoretical explanations of this kind are assumed to be superior to the supposedly descriptive-only approaches of linguistic structuralism and its successors. In this paper I shall try to argue that on close inspection such claims turn out to be highly problematic, both on linguistic and on science-theoretical grounds.
Online Access Tools for Spoken German: The Resources of the Deutsches Spracharchiv in a Database
(2002)
This paper shows some details of the modernization of the Deutsches Spracharchiv (DSAv). It explores some future possibilities of linguistical documentation and analysis using the Web. The Institut für Deutsche Sprache (IDS) in Mannheim is the central institution for linguistic research in Germany. The DSAv in the IDS is the center for documentation and research of spoken German. These archives include the largest collection of sound recordings of spoken German (dialects and colloquial speech, including e.g. lots of extinct dialects of former German territories in Eastern Europe) - altogether more than 15,000 sound recordings. The lacking clarification and accessibility of this data material has been felt as an essential deficit. The opportunity to edit the sound signal digitally offers a much easier access to spoken language. Through the integration of the already existing information about the corpora and the transcribed texts in an information- and full text databank, as well as the linking of the data with the acoustic signal (alignment), arises a data-pool with considerably better documentation of the materials and a fast direct grasp of the recorded sounds. Thus, the DSAv initiates totally new research questions for the work at the IDS, as well as for linguistics altogether.
The FrameNet lexical database yields information about collocations and multiword expressions in various ways. In some cases phrasal units have been entered from the start as lexical entries (write down). In other cases headword + preposition pairs can be recognized as special collocations Where the preposition in question is a necessary and lexically specified marker of an argument of the headword + fond of, hostile to). Nominal compounds are annotated with respect to noun or (pertinative) adjective modifiers, some of which are analyzable but also entrenched (wheel chair, fiscal year). Nouns that name aggregates, portions, types, etc., sometimes hold lexically specified relations to their dependents (flock of geese). And event nouns frequently Select the support verbs which permit them to enter into predications (file an objection, enter a plea). A subproject aims at extracting, as structured clusters of lexical items, the minimal semantically central kernel dependency graphs from the set of annotations. Such research will yield not only commonplace groupings (eat: dog, bone) but will also yield hitherto unnoticed collocations within such graphs (answer: you, door) where certain dependency links within them are idiomatic or otherwise lexically special, here answer > door. Collocational information can also be retrieved by various types of queries within our MySQL search tool
The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; Kohl et al. 1998). In this paper, we compare the Levin classification with the work of the FrameNet project (Fillmore & Baker 2001), where words (not just verbs) are grouped according to the conceptual structures (frames) that underlie them and their combinatorial patterns are inductively derived from corpus evidence. This means that verbs grouped together in FrameNet (FN) might be semantically similar but have different (or no) alternations, and that verbs which share the same alternation might be represented in two different semantic frames.
In this paper, we investigate the practical applicability of Co-Training for the task of building a classifier for reference resolution. We are concerned with the question if Co-Training can significantly reduce the amount of manual labeling work and still produce a classifier with an acceptable performance.
We describe a simple and efficient Java object model and application programming interface (API) for (possibly multi-modal) annotated natural language corpora. Corpora are represented as elements like Sentences, Turns, Utterances, Words, Gestures and Markables. The API allows linguists to access corpora in terms of these discourse-level elements, i.e. at a conceptual level they are familiar with, with the flexibility offered by a general purpose programming language. It is also a contribution to corpus standardization efforts because it is based on a straightforward and easily extensible data model which can serve as a target for conversion of different corpus formats.
The principal claim of this dissertation is that there is a unique structural core shared by Double Object, Dative Experiencer and Existential/Presentational constructions. This core is argued to take the form of a Cipient Predication structure, `cipient covering traditional notions like (affected) source/goal, recipient, indirect object or dative experiencer. Central questions arising in defining Cipient Predication are: How are cipients thematically licensed, and what is the role of there in argument-structural terms? What is the structural locus of cipients/there? What is the role and nature of dative case? How can the possessive interpretation, the blocking and definiteness effects associated with the above-mentioned constructions be explained? Cipients are presented as external arguments and logical subjects (location individuals) of predicates derived from a propositional meaning embedded in the VP, the predicate formed by a lower tense head `little t that is overtly realized as there. Little t is argued to encode a distinction at the reference time level, structural dative hinging on a tense property like structural nominative. The cipient relates as a whole to a part to a VP-internal location argument that together with the theme furnishes the propositional meaning (`possession ). As logical subjects, cipients anchor the predicate to the utterance context, forcing its interpretation in extralinguistic terms (`blocking effects ). It is proposed that lacking structurally encoded subjects, Existential/Presentational constructions are not saturated expressions in syntax, precluding the interpretation of certain quantifiers (most/every, vide `definiteness effects ). Cipient Predication, couched in terms of the Minimalist Program (in particular, Chomsky 1999) and a semantics relying on tense and the ontological distinction of locations as well as scalar and part-whole structure, should be of interest to scholars working on datives, argument structure, and the syntax/semantics/pragmatics interface more generally.
In Articulatory Phonology the jaw is not controlled individually but serves as an additional articulator to achieve the primary constriction. In this study the timing of jaw and tongue tip gestures for the coronal consonants /s, , t, d, n, l/ is analysed by means of EMMA. The findings suggest that the tasks of the jaw for the fricatives are to provide a second noise source and to stabilise the tongue position (more pronounced for /s/). For the voiceless stop, the speakers seem to aim at a high jaw position for producing a prominent burst. For /l/ a low jaw position is essential for avoiding lateral contact and for the apical articulation of this sound.
This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats. Categories of a source format are transformed into a target format via a given set of general or language-specific mapping rules. The second relates treebanks via a transformation to a general model of linguistic categories, for example based on the EAGLES recommendations for syntactic annotations of corpora, or relying on the HPSG framework. This paper proposes a new methodology as a solution for these desiderata.
The following analysis explores the nature of everyday activities of people in economic leadership positions. It inquires as to the institutions and people they are in contact with, and the way they communicate with them. First, I will present existing studies in this field and the ethnographic procedure of this study. I will then describe the communicative activities of high-level personnel and their communication networks based on observations. These are then linked to their communicative tasks and types of interaction. Finally, I will discuss some characteristics of the communicative style.
We present an approach on how to investigate what kind of semantic information is regularly associated with the structural markup of scientific articles. This approach addresses the need for an explicit formal description of the semantics of text-oriented XML-documents. The domain of our investigation is a corpus of scientific articles from psychology and linguistics from both English and German online available journals. For our analyses, we provide XML-markup representing two kinds of semantic levels: the thematic level (i.e. topics in the text world that the article is about) and the functional or rhetorical level. Our hypothesis is that these semantic levels correlate with the articles’ document structure also represented in XML. Articles have been annotated with the appropriate information. Each of the three informational levels is modelled in a separate XML document, since in our domain, the different description levels might conflict so that it is impossible to model them within a single XML document. For comparing and mining the resulting multi-layered XML annotations of one article, a Prolog-based approach is used. It focusses on the comparison of XML markup that is distributed among different documents. Prolog predicates have been defined for inferring relations between levels of information that are modelled in separate XML documents. We demonstrate how the Prolog tool is applied in our corpus analyses.
We present a light-weight tool for the annotation of linguistic data on multiple levels. It is based on the simplification of annotations to sets of markables having attributes and standing in certain relations to each other. We describe the main features of the tool, emphasizing its simplicity, customizability and versatility
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.