Refine
Year of publication
- 2004 (186) (remove)
Document Type
- Part of a Book (97)
- Article (46)
- Conference Proceeding (20)
- Book (12)
- Part of Periodical (6)
- Doctoral Thesis (1)
- Habilitation (1)
- Other (1)
- Review (1)
- Working Paper (1)
Keywords
- Deutsch (111)
- Korpus <Linguistik> (16)
- Konversationsanalyse (13)
- Phraseologie (11)
- Sprachgeschichte (11)
- Wortverbindung (10)
- Annotation (9)
- Gesprochene Sprache (9)
- Logische Partikel (9)
- Verb (8)
Publicationstate
- Veröffentlichungsversion (75)
- Zweitveröffentlichung (35)
- Postprint (6)
Reviewstate
Publisher
- de Gruyter (37)
- Institut für Deutsche Sprache (17)
- Lang (11)
- Stauffenburg (7)
- Narr (6)
- Schmidt (4)
- Verlag für Gesprächsforschung (4)
- iudicium (4)
- Carocci (3)
- De Gruyter (3)
The motivation for this article is to describe a methodology for interrelating and analyzing language and theory-specific corpus data from various languages. As an example phenomeon we use information structure (IS, see [3]) in treebanks from three languages: Spanish, Korean and Japanese. Korean and Japanese are typologically close, while both are typologically different from Spanish. Therefore, the problem of annotating IS is that there are diverging language-specific formal linguistic means for the realization of IS-functions (like “topicalization / contrast”) on various levels like prosody, morphology and word-order. Hence, it is necessary to describe the relations between language-specific formal means and functional views on IS, and how to operationalize these relations for corpus analysis.
Das Bild von der 'Sprache der DDR' in der alten Bundesrepublik oder: Haben sie so gesprochen?
(2004)
The goal of the MULI (MUltiLingual Information structure) project is to empirically analyse information structure in German and English newspaper texts. In contrast to other projects in which information structure is annotated and investigated (e.g. in the Prague Dependency Treebank, which mirrors the basic information about the topic-focus articulation of the sentence), we do not annotate theory-biased categories like topic-focus or theme-rheme. Trying to be as theory-independent as possible, we annotate those features which are relevant to information structure and on the basis of which typical patterns, co-occurrences or correlations can be determined. We distinguish between three annotation levels: syntax, discourse and prosody. The data is based on the TIGER Corpus for German and the Penn Treebank for English, since the existing information on part-of-speech and syntactic structure can be re-used for our purposes. The actual annotation of an English example sequence illustrates our choice of categories on each level. Their combination offers the possibility to investigate how information structure is realised and can be interpreted.
We present the annotation of information structure in the MULI project. To learn more about the information structuring means in prosody, syntax and discourse, theory- independent features were defined for each level. We describe the features and illustrate them on an example sentence. To investigate the interplay of features, the representation has to allow for inspecting all three layers at the same time. This is realised by a stand-off XML mark-up with the word as the basic unit. The theory-neutral XML stand-off annotation allows integrating this resource with other linguistic resources such as the Tiger Treebank for German or the Penn treebank for English.