Refine
Year of publication
- 2008 (62) (remove)
Document Type
- Conference Proceeding (29)
- Part of a Book (16)
- Article (13)
- Doctoral Thesis (3)
- Working Paper (1)
Language
- English (62) (remove)
Is part of the Bibliography
- no (62)
Keywords
- Deutsch (15)
- Korpus <Linguistik> (9)
- Annotation (5)
- Automatische Sprachanalyse (4)
- Computerlinguistik (4)
- Computerunterstützte Lexikographie (4)
- Englisch (4)
- Gesprochene Sprache (4)
- Mehrsprachigkeit (4)
- Computerunterstützte Lexikografie (3)
Publicationstate
- Veröffentlichungsversion (33)
- Postprint (7)
- Zweitveröffentlichung (4)
Reviewstate
Publisher
- European Language Resources Association (ELRA) (7)
- de Gruyter (4)
- ELRA (3)
- University of Oulu (3)
- Academia (2)
- Benjamins (2)
- European Language Resources Association (2)
- Aisthesis (1)
- BBAW (1)
- CSLI (1)
In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with automatically assigning a text a tree structure in which discourse segments and rhetorical relations between them are marked, such as Concession. For identifying the combinable segments, declarative rules are employed, which describe linguistic and structural cues and constraints about possible combinations by referring to different XML annotation layers of the input text, and external knowledge bases such as a discourse marker lexicon, a lexico-semantic ontology (later to be combined with a domain ontology), and an ontology of rhetorical relations. In our text-technological environment, the obvious choice of formalism to represent such ontologies is OWL (Smith et al., 2004). In this paper, we describe two OWL ontologies and how they are consulted from the discourse parser to solve certain tasks within DP. The first ontology is a taxononomy of rhetorical relations which was developed in the project. The second one is an OWL version of GermaNet, the model of which we designed together with our project partners.
The metadata management system for speech corpora “memasysco” has been developed at the Institut für Deutsche Sprache (IDS) and is applied for the first time to document the speech corpus “German Today”. memasysco is based on a data model for the documentation of speech corpora and contains two generic XML schemas that drive data capture, XML native database storage, dynamic publishing, and information retrieval. The development of memasysco’s information architecture was mainly based on the ISLE MetaData Initiative (IMDI) guidelines for publishing metadata of linguistic resources. However, since we also have to support the corpus management process in research projects at the IDS, we need a finer atomic granularity for some documentation components as well as more restrictive categories to ensure data integrity. The XML metadata of different speech corpus projects are centrally validated and natively stored in an Oracle XML database. The extension of the system to the management of annotations of audio and video signals (e.g. orthographic and phonetic transcriptions) is planned for the near future.
Europe is a continent of many languages. We all know that, but normally when we think about this fact, we focus on national languages, the type of language that shapes our political and our linguistic geography. But as natural as it may seem today, the idea of a language closely being interrelated with one's identity does not have a very long tradition. In fact it is only since the late 18th century that we think there is some type of intimate connection between the language spoken and the identity of a person as belonging to a nation. And even if the stabilization of European nation states was closely connected with this type of reasoning, European language communities differ considerably in their way of dealing with natural variation within their national language. For some of them, it is only the standardized national language that is relevant in this respect; for others, a certain amount of variation is a central part of their linguistic identity.
In spring 2002, we celebrated the inauguration of the first German-Russian-Jewish kindergarten in Berlin. Nowadays, there are seven bilingual German-Russian kindergartens with 4 60 places and 78 bilingual kindergartens with other combinations of languages [SENBWF]. Maybe it is not enough, taking into account the large proportion o f immigrants in the population of Berlin1. And yet, much progress has been achieved, endorsing the fact that German society has begun to change its attitude towards other languages on its territory. The initial request for German monolingualism first changed into societal tolerance of multilingualism and eventually to the recognition o f the value of multilingualism. This process is a very slow one, and it is not yet complete. In my article, I would like to look at the development in the last few years of the political framework that has made possible, on the one hand, the opening of bilingual kindergartens in Berlin, and on the other hand, to consider what has hampered this process until now. I would like to emphasise three most important political spheres: linguistic, educational and integrational.
Between 1884 and 1900, Germany established protectorates in large areas of the South Pacific. The authorities assumed that the linguistically extremely diverse areas would pose communication problems. Thus the question arose whether German should become the lingua franca in the South Pacific. After a controversial discussion; the German government implemented language policies to promote the German language in the colonies. This chapter shows why, on the one hand, German language policies were doomed to failure and why, on the other, they unintentionally supported other linguistic developments such as the introduction of borrowing from German into indigenous languages, the development of German settler varieties, and the spread of pidgin languages.
In our study we use the experimental framework of priming to manipulate our subjects’ expectations of syllable prominence in sentences with a well-defined syntactic and phonological structure. It shows that it is possible to prime prominence patterns and that priming leads to significant differences in the judgment of syllable prominence.
The paper reports on experiments with acoustic recordings of a self-built replica of the historic speaking machine of Wolfgang von Kempelen. Several possibilities of the reed as the glottal excitation mechanism were tested. Perception tests with naïve listeners revealed that the machinegenerated words 'mama' and 'papa' were partially recognised as an authentic child voice – as it was also the case in von Kempelen's demonstrations in the late 18th century.
This is a study of how aspects of information structure can be captured within a formal grammar of Spanish, couched in the framework of Head-Driven Phrase Structure Grammar (HPSG, Pollard
and Sag 1994). While a large number of morphological, syntactic and semantic aspects in a variety of languages have been successfully analysed in this theory, information structure has not been paid the same attention in the HPSG literature. However, as a theory of signs, HPSG should include all
levels of description without which the structural descriptions offered by the grammar would ultimately remain incomplete. Languages often explicitly mark the information-structural partitioning of utterances. Depending on the particular language, linguistic resources used for this purpose include
prosody (stress/intonation), syntax (e. g. constituent order, special syntactic constructions) and morphology (e. g. special affixes). In HPSG, phonological, syntactic, semantic and pragmatic information is represented in parallel, which would seem to be a well-suited architecture for modelling
the sort of interfaces called for.
The multiple gradations of German strong verbs are but manifestations of a rather uncomplicated system. There is a small number of ways to make up ablaut forms; these types of formation are identifiable in formal terms and, what is more, they have definite functions as morphological markers. Using classifications of stem forms according to quality, complexity and quantity of vowels, three types of operations involved in ablaut formation are identified. Ablaut always includes a change of quality type or a change of complexity type, and in addition it may include a change of quantity type. Ablaut forms are clearly distinguished as against bases (and against each other): their vocalism meets a defined standard of dissimilarity. On this basis, gradations are collected into inflectional classes that are defined in strictly synchronic terms. These classes continue the historical seven classes known from reference grammars. For the majority of strong verbs, membership in these classes (and thus ablaut) is predictable.