Korpuslinguistik
Refine
Year of publication
Document Type
- Conference Proceeding (17)
- Article (13)
- Part of a Book (10)
- Part of Periodical (8)
- Other (5)
- Book (4)
- Working Paper (1)
Keywords
- Korpus <Linguistik> (39)
- Deutsch (28)
- Corpus linguistics (15)
- Corpus technology (12)
- Sprachgeschichte (8)
- Sprachpflege (8)
- Large corpora (7)
- Annotation (6)
- Corpus annotation (6)
- Datenbanksystem (6)
Publicationstate
Reviewstate
- Peer-Review (20)
- (Verlags)-Lektorat (15)
- Verlags-Lektorat (1)
Publisher
- Institut für Deutsche Sprache (58) (remove)
COSMAS. Ein Computersystem für den Zugriff auf Textkorpora. Version R.1.3-1. Benutzerhandbuch
(1994)
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ― inferior, but rather ― different.
Bericht von der Dritten Internationalen Konferenz „Grammatik und Korpora“, Mannheim, 22. - 24.9.2009
(2009)
With an increasing amount of text data available it is possible to automatically extract a variety of information about language. One way to obtain knowledge about subtle relations and analogies between words is to observe words which are used in the same context. Recently, Mikolov et al. proposed a method to efficiently compute Euclidean word representations which seem to capture subtle relations and analogies between words in the English language. We demonstrate that this method also captures analogies in the German language. Furthermore, we show that we can transfer information extracted from large non-annotated corpora into small annotated corpora, which are then, in turn, used for training NLP systems.
Many (modernist) works of literature can be understood by their associativeness, be it constructed or “free”. This network-like character of (modernist) literature has often been addressed by terms like “free association”, connotation”, “context” or “intertext”. This paper proposes an experimental and exemplary approach to intraconnect a literary corpus of the Austrian writer Ilse Aichinger with semantic web-technologies to enable interactive explorations of word-associations.
The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.