Refine
Document Type
- Part of a Book (3)
- Article (1)
- Conference Proceeding (1)
- Working Paper (1)
Has Fulltext
- yes (6)
Keywords
- Deutsch (5)
- Korpus <Linguistik> (4)
- Phraseologismus (2)
- Wortverbindung (2)
- Annotation (1)
- Computerunterstützte Lexikographie (1)
- Gesprochene Sprache (1)
- Grammatik (1)
- Kollokation (1)
- Komposition <Wortbildung> (1)
Publicationstate
- Veröffentlichungsversion (6) (remove)
Reviewstate
- (Verlags)-Lektorat (6) (remove)
We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of written German which has approximately two billion word tokens and is located at the Institute for the German Language (IDS). We employ a strongly usage-based approach to multi-word expressions, which we think of as conventionalised patterns in language use that manifest themselves in recurrent syntagmatic patterns of words. They are defined by their distinct function in language. To find multi-word expressions, we allow ourselves to be guided by corpus data and statistical evidence as much as possible, making interpretative steps carefully and in a monitored fashion. We develop a procedure of interpretation that leads us from the evidence of collocation profiles to a collection of recurrent word patterns and finally to multi-word expressions. When building up a collection of multi-word expressions in this fashion, it becomes clear that the expressions can be defined on different levels of generalisation and are interrelated in various ways. This will be reflected in the documentation and presentation of the findings. We are planning to add annotation in a way that allows grouping the multi-word expressions according to different features and to add links between them to reflect their relationships, thus constructing a network of multi-word expressions.
In this paper we outline our corpus-driven approach to detecting, describing and presenting multi- word expressions (MWEs). Our goal is to treat MWEs in a way that gives credit to their flexible nature and their role in language use. The bases of our research are a very large corpus and a Statistical method of collocation analysis. The rich empirical data is interpreted linguistically in a structured way which captures the interrelations, patterns and types of variances of MWEs. Several levels of abstraction build on each other: surface patterns, lexical realizations (LRs), MWEs and MWE patterns. Generalizations are made in a controlled way and in adherence to corpus evidence. The results are published online in a hypertext format.