Refine
Year of publication
- 2017 (35) (remove)
Document Type
- Conference Proceeding (19)
- Article (11)
- Part of a Book (4)
- Working Paper (1)
Language
- English (35) (remove)
Has Fulltext
- yes (35) (remove)
Is part of the Bibliography
- no (35) (remove)
Keywords
- Korpus <Linguistik> (11)
- Corpus linguistics (8)
- Deutsch (5)
- Experimentelle Psychologie (5)
- Augenfolgebewegung (4)
- Blickbewegung (4)
- Computerlinguistik (4)
- Corpus technology (4)
- Texttechnologie (4)
- Datenmanagement (3)
Publicationstate
- Veröffentlichungsversion (25)
- Zweitveröffentlichung (7)
- Postprint (6)
Reviewstate
- Peer-Review (30)
- (Verlags)-Lektorat (3)
Publisher
The Manatee corpus management system on which the Sketch Engine is built is efficient, but unable to harness the power of today’s multiprocessor machines. We describe a new, compatible implementation of Manatee which we develop in the Go language and report on the performance gains that we obtained.
This article describes a series of ongoing efforts at the Stanford Literary Lab to manage a large collection of literary corpora (~40 billion words). This work is marked by a tension between two competing requirements – the corpora need to be merged together into higher-order collections that can be analyzed as units; but, at the same time, it’s also necessary to preserve granular access to the original metadata and relational organization of each individual corpus. We describe a set of data management practices that try to accommodate both of these requirements – Apache Spark is used to index data as Parquet tables on an HPC cluster at Stanford. Crucially, the approach distinguishes between what we call “canonical” and “combined” corpora, a variation on the well-established notion of a “virtual corpus” (Kupietz et al., 2014; Jakubíek et al., 2014; van Uytvanck, 2010).
This paper outlines the broad research context and rationale for a new international comparable corpus (ICC). The ICC is to be largely modelled on the text categories and their quantities the International Corpus of English with only a few changes. The corpus will initially begin with nine European languages but others may join in due course. The paper reports on those and other agreements made at the inaugural planning meeting in Prague on 22-23 June 2017. It also sets out the project’s goals for its first two years.
Many (modernist) works of literature can be understood by their associativeness, be it constructed or “free”. This network-like character of (modernist) literature has often been addressed by terms like “free association”, connotation”, “context” or “intertext”. This paper proposes an experimental and exemplary approach to intraconnect a literary corpus of the Austrian writer Ilse Aichinger with semantic web-technologies to enable interactive explorations of word-associations.
This study investigates the interrelations between bilingual development (German/Russian), immigration and integration in the host society. Participants are Russian-Germans, that is, ethnic Germans who have repatriated to Germany from the former Soviet Union. They were part of a longitudinal study dedicated to the integration of multi-generation Russian-German families in Germany. The paper focuses on eight Russian-Germans who moved to Germany between the ages of five and eight and are now young adults. The analysis is based on interviews conducted in the twentieth year of their life in Germany in German and Russian, A semi-structured questionnaire was used to elicit information on the main stages of integration, the use of the languages, the attitudes towards German and Russian, and an assessment of the current situation. The obtained data were used to make an initial assessment of the oral language competencies of the participants and as sources of information about the objective facts and subjective attitudes that determined linguistic and social integration.