Refine
Year of publication
- 2007 (3) (remove)
Document Type
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Deutsch (3)
- Korpus <Linguistik> (3)
- Antonym (1)
- Archiv für Gesprochenes Deutsch (AGD) (1)
- Gesprochene Sprache (1)
- Homonym (1)
- Kollokation (1)
- Lexikographie (1)
- Metadaten (1)
- Semasiologie (1)
Publicationstate
- Veröffentlichungsversion (3) (remove)
Reviewstate
Publisher
- University of Birmingham (3) (remove)
We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of written German which has approximately two billion word tokens and is located at the Institute for the German Language (IDS). We employ a strongly usage-based approach to multi-word expressions, which we think of as conventionalised patterns in language use that manifest themselves in recurrent syntagmatic patterns of words. They are defined by their distinct function in language. To find multi-word expressions, we allow ourselves to be guided by corpus data and statistical evidence as much as possible, making interpretative steps carefully and in a monitored fashion. We develop a procedure of interpretation that leads us from the evidence of collocation profiles to a collection of recurrent word patterns and finally to multi-word expressions. When building up a collection of multi-word expressions in this fashion, it becomes clear that the expressions can be defined on different levels of generalisation and are interrelated in various ways. This will be reflected in the documentation and presentation of the findings. We are planning to add annotation in a way that allows grouping the multi-word expressions according to different features and to add links between them to reflect their relationships, thus constructing a network of multi-word expressions.
Incompatibility (or co-hyponymy) is the most general type of semantic relation between lexical items, the meaning of which entails exclusion. Such items fall under a superordinate term or concept and denote sets which have no members in common (e.g. animal: dog-cat-mouse-lion-sheep; example from Cruse 2004). Traditionally, these have been of interest to lexical semanticists for the description of the structure of the lexicon. However, incompatibility is not just a relation that signifies a difference of meaning. This paper is a critical corpus-assisted re-evaluation of the phenomenon of incompatibility which argues that the relation in question sometimes also functions as a discourse marker. Incompatibles indicate recurrent intertextual patterns. This holds particularly true for socially or politically controversial lexical items such as Flexibilität (flexibility), Mobilität (mobility) or Globalisierung (globalisation). Corpus investigations of such words have revealed that among other semantically related terms, incompatibles have a crucial discourse focussing function. For the German lexical item Globalisierung, I will show how its lexical usage can be studied through a corpus-driven analysis of corresponding incompatibles. Incompatible terms are not contingent co-words but often occur in close contextual proximity and participate in regular syntagmatic structures (e.g. Globalisierung und Rationalisierung; Globalisierung und Modernisierung; Neoliberalismus, Globalisierung und Kapitalismus). Hence, these are easily extracted by conducting a computational collocation analysis. Such significant collocates provide a good insight into the discursive and thematic contexts of the search word. Following Teubert (2004), I will demonstrate how the meaning of such lexical items is constituted in discourse and how the examination of these particular collocates reveals their sense-constructing function and their pragmatic-discursive force. I will provide a brief discussion of the methodology used for such analyses, and I will explain why the complex semantic-pragmatic and thematic-communicative patterns implied in sets of incompatibles should be given a stronger emphasis in lexicography.
We present an XML-based metadata standard for the documentation of speech and multimedia corpora that was developed at the Institute for German Language (IDS) in Mannheim, Germany. The IDS is one of the major institutions providing German speech and language corpora to researchers. These corpora stem from many different sources and were previously documented in a rather heterogeneous fashion using a variety of data models and formats. In order to unify the documentation for existing and future corpora, the IDS- internal Archive for Spoken German collaborated with several projects and developed a set of standardised XML metadata schemas. These XML schemas build on existing internal and external documentation schemas (such as IMDI) and take into account the workflow of speech corpus production. In order to minimise redundancy, separate schemas were designed for projects, speakers, recording sessions, and entire corpora. The resulting schemas are tested in ongoing speech and multi-media projects at the IDS and are regularly revised. They are accompanied by element definitions, guidelines, and examples. In addition, a mapping to IMDI will be provided.