Volltext-Downloads (blau) und Frontdoor-Views (grau)

Corpus-driven study of multi-word expressions based on collocations from a very large corpus

  • We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of written German which has approximately two billion word tokens and is located at the Institute for the German Language (IDS). We employ a strongly usage-based approach to multi-word expressions, which we think of as conventionalised patterns in language use that manifest themselves in recurrent syntagmatic patterns of words. They are defined by their distinct function in language. To find multi-word expressions, we allow ourselves to be guided by corpus data and statistical evidence as much as possible, making interpretative steps carefully and in a monitored fashion. We develop a procedure of interpretation that leads us from the evidence of collocation profiles to a collection of recurrent word patterns and finally to multi-word expressions. When building up a collection of multi-word expressions in this fashion, it becomes clear that the expressions can be defined on different levels of generalisation and are interrelated in various ways. This will be reflected in the documentation and presentation of the findings. We are planning to add annotation in a way that allows grouping the multi-word expressions according to different features and to add links between them to reflect their relationships, thus constructing a network of multi-word expressions.
Metadaten
Author:Annelen Brunner, Kathrin Steyer
URN:urn:nbn:de:bsz:mh39-41414
Parent Title (English):Proceedings of the 4th Corpus Linguistics conference, Birmingham
Publisher:University of Birmingham
Place of publication:Birmingham
Document Type:Conference Proceeding
Language:English
Year of first Publication:2007
Date of Publication (online):2015/09/16
Publicationstate:Veröffentlichungsversion
Reviewstate:(Verlags)-Lektorat
GND Keyword:Deutsch; Kollokation; Korpus <Linguistik>; Sprachstatistik
Pagenumber:12
Dewey Decimal Classification:400 Sprache / 410 Linguistik
Leibniz-Classification:Sprache, Linguistik
Linguistics-Classification:Korpuslinguistik
Open Access?:Ja
Licence (German):Es gilt das UrhG