Refine
Year of publication
- 2021 (101) (remove)
Document Type
- Article (44)
- Part of a Book (26)
- Part of Periodical (10)
- Report (6)
- Review (5)
- Book (4)
- Conference Proceeding (4)
- Other (1)
- Periodical (1)
Is part of the Bibliography
- no (101) (remove)
Keywords
- Deutsch (36)
- Interaktion (17)
- Konversationsanalyse (16)
- Kommunikation (14)
- Germanistik (12)
- Linguistik (11)
- Leibniz-Institut für Deutsche Sprache (IDS) (10)
- Korpus <Linguistik> (9)
- Mehrsprachigkeit (8)
- COVID-19 (7)
Publicationstate
- Veröffentlichungsversion (58)
- Zweitveröffentlichung (25)
- Postprint (5)
- Hybrides Open Access (2)
Reviewstate
- (Verlags)-Lektorat (46)
- Peer-Review (38)
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (32)
- de Gruyter (18)
- Taylor & Francis (9)
- IDS-Verlag (6)
- Peter Lang (5)
- Winter (5)
- CLARIAH-DE (3)
- Erich Schmidt (2)
- Leibniz-Institut für Deutsche Sprache (2)
- Association for Computational Linguistics (1)
Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus
(2021)
Since the introduction of large language models in Natural Language Processing, large raw corpora have played a crucial role in Computational Linguistics. However, most of these large raw corpora are either available only for English or not available to the general public due to copyright issues. Nevertheless, there are some examples of freely available multilingual corpora for training Deep Learning NLP models, such as the OSCAR and Paracrawl corpora. However, they have quality issues, especially for low-resource languages. Moreover, recreating or updating these corpora is very complex. In this work, we try to reproduce and improve the goclassy pipeline used to create the OSCAR corpus. We propose a new pipeline that is faster, modular, parameterizable, and well documented. We use it to create a corpus similar to OSCAR but larger and based on recent data. Also, unlike OSCAR, the metadata information is at the document level. We release our pipeline under an open source license and publish the corpus under a research-only license.
In verschiedenen europäischen Ländern ist in letzter Zeit in der Soziolinguistik die Frage diskutiert worden, ob sich zwischen der traditionellen Standardsprache und den regionalen bzw. Substandardvarietäten ein neuer Standard („Neo-Standard“) herausgebildet hat; ein Standard, der sich nicht nur strukturell vom alten unterscheidet, sondern sich auch durch ein anderes Prestige auszeichnet als dieser: Er wirkt (im Vergleich) informeller, subjektiver, moderner, kreativer etc.In diesem Beitrag werden einige wesentliche Eigenschaften von Neo Standards diskutiert und ihre Entwicklung als Folge der „Demotisierung“ (Mattheier) der Standardsprache beschrieben. Neben dem potenziellen Neo-Standard in Deutschland werden auch die Entwicklungen in Dänemark, Belgien und Italien diskutiert.
Playing videogames is a popular social activity; people play videogames in different places, on different media, in different situations, alone or with partners, online or offline. Unsurprisingly, they thereby share space (physically or virtually) with other playing or non-playing people. The special issue investigates through different contexts and settings how non-players become participants of the gaming interaction and how players and non-players co-construct presence. The introduction provides a problem-related context for the individual contributions and then briefly presents them.
This paper investigates situations in French videogame interactions where non-players who share the same physical space as players, participate in the gaming activities as spectators. Through a detailed multimodal and sequential analysis, we show that being a spectator is a local achievement of all co-present participants - players and non-players.
In conversation, speakers need to plan and comprehend language in parallel in order to meet the tight timing constraints of turn taking. Given that language comprehension and speech production planning both require cognitive resources and engage overlapping neural circuits, these two tasks may interfere with one another in dialogue situations. Interference effects have been reported on a number of linguistic processing levels, including lexicosemantics. This paper reports a study on semantic processing efficiency during language comprehension in overlap with speech planning, where participants responded verbally to questions containing semantic illusions. Participants rejected a smaller proportion of the illusions when planning their response in overlap with the illusory word than when planning their response after the end of the question. The obtained results indicate that speech planning interferes with language comprehension in dialogue situations, leading to reduced semantic processing of the incoming turn. Potential explanatory processing accounts are discussed.
Coronaparty, Jo-jo-Lockdown und Mask-have – Wortschatzerweiterung während des Corona-Stillstands
(2021)
Forschungsprojekte erschließen, erfassen und publizieren eine große Menge digitaler Daten. Bis zur Publikation entstehen häufig Vorarbeiten oder auch Nebenprodukte des beabsichtigten Ergebnisses (beispielsweise Transkriptionen einzelner Texte oder Textzeugen, die die Grundlage z.B. für eine Edition bilden). CLARIAH-DE bietet verschiedene Möglichkeiten zur Integration von Angeboten und Inhalten aus der Community, die deren längerfristige Sicht- und Nachnutzbarkeit sicherstellt. Die vorliegende Handreichung befasst sich mit den Fragen, welche Textangebote wo und auf welche Weise archiviert werden können, sowie welche Kriterien verschiedene Arten von Daten erfüllen müssen, um grundsätzlich für eine Übernahme in den CLARIAH-DE-, Forschungsdatenmanagement- oder NFDI-Kontext geeignet zu sein.