OPUS 4 | Search

35 search hits

31 to 35

Sort by

Accelerating corpus search using multiple cores (2017)

Rábara, Radoslav ; Rychlý, Pavel ; Herman, Ondřej ; Jakubíček, Miloš

The Manatee corpus management system on which the Sketch Engine is built is efficient, but unable to harness the power of today’s multiprocessor machines. We describe a new, compatible implementation of Manatee which we develop in the Go language and report on the performance gains that we obtained.

Organizing corpora at the Stanford Literary Lab. Balancing simplicity and flexibility in metadata management (2017)

McClure, David ; Algee-Hewitt, Mark ; Douris, Steele ; Fredner, Erik ; Walser, Hannah

This article describes a series of ongoing efforts at the Stanford Literary Lab to manage a large collection of literary corpora (~40 billion words). This work is marked by a tension between two competing requirements – the corpora need to be merged together into higher-order collections that can be analyzed as units; but, at the same time, it’s also necessary to preserve granular access to the original metadata and relational organization of each individual corpus. We describe a set of data management practices that try to accommodate both of these requirements – Apache Spark is used to index data as Parquet tables on an HPC cluster at Stanford. Crucially, the approach distinguishes between what we call “canonical” and “combined” corpora, a variation on the well-established notion of a “virtual corpus” (Kupietz et al., 2014; Jakubíek et al., 2014; van Uytvanck, 2010).

From ICE to ICC: The new International Comparable Corpus (2017)

Kirk, John ; Čermáková, Anna

This paper outlines the broad research context and rationale for a new international comparable corpus (ICC). The ICC is to be largely modelled on the text categories and their quantities the International Corpus of English with only a few changes. The corpus will initially begin with nine European languages but others may join in due course. The paper reports on those and other agreements made at the inaugural planning meeting in Prague on 22-23 June 2017. It also sets out the project’s goals for its first two years.

Intra-connecting an exemplary literary corpus with semantic web technologies for exploratory literary studies (2017)

Dittrich, Andreas

Many (modernist) works of literature can be understood by their associativeness, be it constructed or “free”. This network-like character of (modernist) literature has often been addressed by terms like “free association”, connotation”, “context” or “intertext”. This paper proposes an experimental and exemplary approach to intraconnect a literary corpus of the Austrian writer Ilse Aichinger with semantic web-technologies to enable interactive explorations of word-associations.

Young Russian-German adults 20 years after their repatriation to Germany (2017)

Meng, Katharina ; Protassova, Ekaterina

This study investigates the interrelations between bilingual development (German/Russian), immigration and integration in the host society. Participants are Russian-Germans, that is, ethnic Germans who have repatriated to Germany from the former Soviet Union. They were part of a longitudinal study dedicated to the integration of multi-generation Russian-German families in Germany. The paper focuses on eight Russian-Germans who moved to Germany between the ages of five and eight and are now young adults. The analysis is based on interviews conducted in the twentieth year of their life in Germany in German and Russian, A semi-structured questionnaire was used to elicit information on the main stages of integration, the use of the languages, the attitudes towards German and Russian, and an assessment of the current situation. The obtained data were used to make an initial assessment of the oral language competencies of the participants and as sources of information about the objective facts and subjective attitudes that determined linguistic and social integration.

31 to 35

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

35 search hits