OPUS 4 | Search

101 search hits

1 to 10

Sort by

Year
Year
Title
Title
Author
Author

Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus (2021)

Abadji, Julien ; Ortiz Suárez, Pedro Javier ; Romary, Laurent ; Sagot, Benoît

Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are either available only for English or not available to the general public due to copyright issues. Nevertheless, there are some examples of freely available multilingual corpora for training Deep Learning NLP models, such as the OSCAR and Paracrawl corpora. However, they have quality issues, especially for low-resource languages. Moreover, recreating or updating these corpora is very complex. In this work, we try to reproduce and improve the goclassy pipeline used to create the OSCAR corpus. We propose a new pipeline that is faster, modular, parameterizable, and well documented. We use it to create a corpus similar to OSCAR but larger and based on recent data. Also, unlike OSCAR, the metadata information is at the document level. We release our pipeline under an open source license and publish the corpus under a research-only license.

Gibt es einen deutschen Neo-Standard und – wenn ja – wie verhält er sich zu den Entwicklungen der Standards anderer europäischer Sprachen? (2021)

Auer, Peter

In verschiedenen europäischen Ländern ist in letzter Zeit in der Soziolinguistik die Frage diskutiert worden, ob sich zwischen der traditionellen Standardsprache und den regionalen bzw. Substandardvarietäten ein neuer Standard („Neo-Standard“) herausgebildet hat; ein Standard, der sich nicht nur strukturell vom alten unterscheidet, sondern sich auch durch ein anderes Prestige auszeichnet als dieser: Er wirkt (im Vergleich) informeller, subjektiver, moderner, kreativer etc.In diesem Beitrag werden einige wesentliche Eigenschaften von Neo Standards diskutiert und ihre Entwicklung als Folge der „Demotisierung“ (Mattheier) der Standardsprache beschrieben. Neben dem potenziellen Neo-Standard in Deutschland werden auch die Entwicklungen in Dänemark, Belgien und Italien diskutiert.

Co-constructing presence between players and non-players in videogame interactions: Introduction to the Special Issue (2021)

Baldauf-Quilliatre, Heike ; Colón de Carvajal, Isabel

Playing videogames is a popular social activity; people play videogames in different places, on different media, in different situations, alone or with partners, online or offline. Unsurprisingly, they thereby share space (physically or virtually) with other playing or non-playing people. The special issue investigates through different contexts and settings how non-players become participants of the gaming interaction and how players and non-players co-construct presence. The introduction provides a problem-related context for the individual contributions and then briefly presents them.

Spectating: How non-players participate in videogaming (2021)

Baldauf-Quilliatre, Heike ; Colón de Carvajal, Isabel

This paper investigates situations in French videogame interactions where non-players who share the same physical space as players, participate in the gaming activities as spectators. Through a detailed multimodal and sequential analysis, we show that being a spectator is a local achievement of all co-present participants - players and non-players.

Speech planning interferes with language comprehension: Evidence from semantic illusions in question-response sequences (2021)

Barthel, Mathias

In conversation, speakers need to plan and comprehend language in parallel in order to meet the tight timing constraints of turn taking. Given that language comprehension and speech production planning both require cognitive resources and engage overlapping neural circuits, these two tasks may interfere with one another in dialogue situations. Interference effects have been reported on a number of linguistic processing levels, including lexicosemantics. This paper reports a study on semantic processing efficiency during language comprehension in overlap with speech planning, where participants responded verbally to questions containing semantic illusions. Participants rejected a smaller proportion of the illusions when planning their response in overlap with the illusory word than when planning their response after the end of the question. The obtained results indicate that speech planning interferes with language comprehension in dialogue situations, leading to reduced semantic processing of the incoming turn. Potential explanatory processing accounts are discussed.

Bericht zur 22. Arbeitstagung zur Gesprächsforschung am Leibniz-Institut für Deutsche Sprache in Mannheim, 24.–26.03.2021 (2021)

Bauer, Nathalie ; Buck, Isabella

Coronaparty, Jo-jo-Lockdown und Mask-have – Wortschatzerweiterung während des Corona-Stillstands (2021)

Benter, Merle ; Dabóczi, Viktória

Integration von Textdaten aus der Community in bestehende Infrastrukturen (2021)

Boenig, Matthias ; Hug, Marius ; Sendler, Simon

Forschungsprojekte erschließen, erfassen und publizieren eine große Menge digitaler Daten. Bis zur Publikation entstehen häufig Vorarbeiten oder auch Nebenprodukte des beabsichtigten Ergebnisses (beispielsweise Transkriptionen einzelner Texte oder Textzeugen, die die Grundlage z.B. für eine Edition bilden). CLARIAH-DE bietet verschiedene Möglichkeiten zur Integration von Angeboten und Inhalten aus der Community, die deren längerfristige Sicht- und Nachnutzbarkeit sicherstellt. Die vorliegende Handreichung befasst sich mit den Fragen, welche Textangebote wo und auf welche Weise archiviert werden können, sowie welche Kriterien verschiedene Arten von Daten erfüllen müssen, um grundsätzlich für eine Übernahme in den CLARIAH-DE-, Forschungsdatenmanagement- oder NFDI-Kontext geeignet zu sein.

"Sie können Ihr Testament machen, was sonst?" Ärztliche Gesprächsführung zwischen Diagnose und Betroffenheit (2021)

Buß, Marlen

Zeit in der Sprache: historische Profilbildung und Archaisierung (2021)

Cherubim, Dieter

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

101 search hits