Refine
Document Type
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Keywords
- Annotation (1)
- Corpus annotation (1)
- Corpus query language (1)
- Corpus technology (1)
- Datenbanksystem (1)
- Grammatik (1)
- HPSG (1)
- Head-driven phrase structure grammar (1)
- Korpus <Linguistik> (1)
- Large corpora (1)
Publicationstate
Reviewstate
- Peer-Review (2) (remove)
Publisher
In this paper, we deal with register-driven variation from a probabilistic perspective, as proposed in Schäfer, Bildhauer, Pankratz, Müller (2022). We compare two approaches to analyse this variation within HPSG. On the one hand, we consider a multiple-grammar approach and combine it with the architecture proposed in the CoreGram project Müller (2015) - discussing its advantages and disadvantages. On the other hand, we take into account a single-grammar approach and argue that it appears to be superior due to its computational efficiency and cognitive plausibility.
In this paper, I present the COW14 tool chain, which comprises a web corpus creation tool called texrex, wrappers for existing linguistic annotation tools as well as an online query software called Colibri2. By detailed descriptions of the implementation and systematic evaluations of the performance of the software on different types of systems, I show that the COW14 architecture is capable of handling the creation of corpora of up to at least 100 billion tokens. I also introduce our running demo system which currently serves corpora of up to roughly 20 billion tokens in Dutch, English, French, German, Spanish, and Swedish