Korpuslinguistik
Refine
Year of publication
Document Type
- Conference Proceeding (17)
- Article (13)
- Part of a Book (10)
- Part of Periodical (8)
- Other (5)
- Book (4)
- Working Paper (1)
Keywords
- Korpus <Linguistik> (39)
- Deutsch (28)
- Corpus linguistics (15)
- Corpus technology (12)
- Sprachgeschichte (8)
- Sprachpflege (8)
- Large corpora (7)
- Annotation (6)
- Corpus annotation (6)
- Datenbanksystem (6)
Publicationstate
Reviewstate
- Peer-Review (20)
- (Verlags)-Lektorat (15)
- Verlags-Lektorat (1)
Publisher
- Institut für Deutsche Sprache (58) (remove)
The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.
Complex linguistic phenomena, such as Clitic Climbing in Bosnian, Croatian and Serbian, are often described intuitively, only from the perspective of the main tendency. In this paper, we argue that web corpora currently offer the best source of empirical material for studying Clitic Climbing in BCS. They thus allow the most accurate description of this phenomenon, as less frequent constructions can be tracked only in big, well-annotated data sources. We compare the properties of web corpora for BCS with traditional sources and give examples of studies on CC based on web corpora. Furthermore, we discuss problems related to web corpora and suggest some improvements for the future.
Valenz und Kookkurrenz
(2015)