OPUS 4 | Search

Linguistic Landscape und Fremdsprachendidaktik. Perspektiven für die Sprach-, Kultur und Literaturdidaktik (Linguistic landscape and foreign language didactics. Perspectives for language, cultural and literary didactics). Edited by Camilla Badstübner-Kizik & Věra Janíková. Berlin: Peter Lang, 2018, 359 pp. (Posener Beiträge zur Angewandten Linguistik 10). ISBN: 9783631773543. EUR 56,10/GBP 46,00/USD 67,95. [Rezension] (2019)

Marten, Heiko F.

Multilingualism in the Baltic States. Societal discourses and contact phenomena (2019)

This edited collection provides an overview of linguistic diversity, societal discourses and interaction between majorities and minorities in the Baltic States. It presents a wide range of methods and research paradigms including folk linguistics, discourse analysis, narrative analyses, code alternation, ethnographic observations, language learning motivation, languages in education and language acquisition. Grouped thematically, its chapters examine regional varieties and minority languages (Latgalian, Võro, urban dialects in Lithuania, Polish in Lithuania); the integration of the Russian language and its speakers; and the role of international languages like English in Baltic societies. The editors’ introductory and concluding chapters provide a comparative perspective that situates these issues within the particular history of the region and broader debates on language and nationalism at a time of both increased globalization and ethno-regionalism. This book will appeal in particular to students and scholars of multilingualism, sociolinguistics, language discourses and language policy, and provide a valuable resource for researchers focusing on Baltic States, Northern Europe and the post-Soviet world in the related fields of history, political science, sociology and anthropology.

Der Mann, wo ich gesehen habe - das relative wo (2019)

Mösch, Matthias

Er hängte seinen Mantel an den Haken, und dort hing er den ganzen Tag — schwache und starke Flexion und Bedeutungsunterschiede (aus: Grammatik in Fragen und Antworten) (2019)

Kubczak, Jacqueline

Festakt zum 90. Geburtstag von Prof. Dr. Dr. h.c. mult. Ulrich Engel (2019)

Taborek, Janusz

Laudatio auf Christian Fandrych (2019)

Nübling, Damaris

Thilo Weber. 2017. Die TUN-Periphrase im Niederdeutschen. Funktionale und formale Aspekte (Studien zur deutschen Grammatik 94). Tübingen: Stauffenburg. 418 S. [Rezension] (2019)

Berg, Kristian

Complex Lexical Units. Compounds and Multi-Word Expressions (2019)

Both compounds and multi-word expressions are complex lexical units, made up of at least two constituents. The most basic difference is that the former are morphological objects and the latter result from syntactic processes. However, the exact demarcation between compounds and multi-word expressions differs greatly from language to language and is often a matter of debate in and across languages. Similarly debated is whether and how these two different kinds of units complement or compete with each other. The volume presents an overview of compounds and multi-word expressions in a variety of European languages. Central questions that are discussed for each language concern the formal distinction between compounds and multi-word expressions, their formation and their status in lexicon and grammar. The volume contains chapters on German, English, Dutch, French, Italian, Spanish, Greek, Russian, Polish, Finnish, and Hungarian as well as a contrastive overview with a focus on German. It brings together insights from word-formation theory, phraseology and theory of grammar and aims to contribute to the understanding of the lexicon, both from a language-specific and cross-linguistic perspective.

Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures (2019)

Ortiz Suárez, Pedro Javier ; Sagot, Benoît ; Romary, Laurent

Common Crawl is a considerably large, heterogeneous multilingual corpus comprised of crawled documents from the internet, surpassing 20TB of data and distributed as a set of more than 50 thousand plain text files where each contains many documents written in a wide variety of languages. Even though each document has a metadata block associated to it, this data lacks any information about the language in which each document is written, making it extremely difficult to use Common Crawl for monolingual applications. We propose a general, highly parallel, multithreaded pipeline to clean and classify Common Crawl by language; we specifically design it so that it runs efficiently on medium to low resource infrastructures where I/O speeds are the main constraint. We develop the pipeline so that it can be easily reapplied to any kind of heterogeneous corpus and so that it can be parameterised to a wide range of infrastructures. We also distribute a 6.3TB version of Common Crawl, filtered, classified by language, shuffled at line level in order to avoid copyright issues, and ready to be used for NLP applications.

Modelling large parallel corpora. The Zurich Parallel Corpus Collection (2019)

Graën, Johannes ; Kew, Tannon ; Shaitarova, Anastassia ; Volk, Martin

Text corpora come in many different shapes and sizes and carry heterogeneous annotations, depending on their purpose and design. The true benefit of corpora is rooted in their annotation and the method by which this data is encoded is an important factor in their interoperability. We have accumulated a large collection of multilingual and parallel corpora and encoded it in a unified format which is compatible with a broad range of NLP tools and corpus linguistic applications. In this paper, we present our corpus collection and describe a data model and the extensions to the popular CoNLL-U format that enable us to encode it.

The Vast and the Focused: On the need for domain-focused web corpora (2019)

Barbaresi, Adrien

As the Web ought to be considered as a series of sources rather than as a source in itself, a problem facing corpus construction resides in meta-information and categorization. In addition, we need focused data to shed light on particular subfields of the digital public sphere. Blogs are relevant to that end, especially if the resulting web texts can be extracted along with metadata and made available in coherent and clearly describable collections.

Deduplication in large web corpora (2019)

Benko, Vladimír

Our paper tries to find answers to some questions related to deduplication process in large-scale web-crawled corpora. An experiment based on eight corpora from the Aranea family is introduced, and first results are presented.

Knape, Joachim/Kramer, Olaf/Till, Dietmar (Hg.) (2019): Populisten – rhetorische Profile. Tübingen: Narr Francke Attempto (DIALOGE). 106 Seiten. 14,99 € ISBN 978-3-89308-454-8. [Rezension] (2019)

Weidacher, Georg

The best of both worlds: Multi-billion word “dynamic” corpora (2019)

Davies, Mark

Nearly all of the very large corpora of English are “static”, which allows a wide range of one-time, pre-processed data, such as collocates. The challenge comes with large “dynamic” corpora, which are updated regularly, and where preprocessing is much more difficult. This paper provides an overview of the NOW corpus (News on the Web), which is currently 8.2 billion words in size, and which grows by about 170 million words each month. We discuss the architecture of NOW, and provide many examples that show how data from NOW can (uniquely) be extracted to look at a wide range of ongoing changes in English.

Die Graphematik der Morpheme im Deutschen und Englischen (2019)

Berg, Kristian

Wie werden Wörter im Deutschen und im Englischen geschrieben? Wo sind Gemeinsamkeiten, wo sind Unterschiede? Diese Fragen werden aus morphologisch-graphematischer Perspektive bearbeitet. Es geht hier also nicht um Bezüge zwischen Schrift und Lautform (traditionell oft im Fokus der Graphematik), sondern um Korrespondenzen zwischen Schrift und Morphologie. Das betrifft zum einen den Aufbau von Morphemen. Welche Beschränkungen lassen sich hier für die Abfolge der Buchstaben formulieren? Was sind minimale, was sind prototypische Stämme und Affixe? Zum anderen geht es um Fragen der Einheitlichkeit (Wie uniform wird ein Morphem in der Schrift repräsentiert?) und der Eindeutigkeit (Wie distinkt verweist eine Schreibung auf ein Morphem?). Insgesamt zeigt sich, dass im Englischen eher Affixe verlässlich kodiert werden (oft eindeutig und einheitlich), während im Deutschen häufig Stämme einheitlich kodiert werden. Das sind zwei grundsätzlich unterschiedliche Strategien der Leseerleichterung.

Sprechen im Umbruch. Zeitzeugen erzählen und argumentieren rund um den Fall der Mauer im Wendekorpus (2019)

Mobile Medienpraktiken im Spannungsfeld von Öffentlichkeit, Privatheit und Anonymität (2019)

König, Katharina ; Oloff, Florence

This contribution aims to describe privacy, publicness and anonymity as essential analytic dimensions for media linguistic research. The dimensions are not inherent in and predetermined by the technical features and forms of communication provided by mobile devices, but are used by the participants as an orientation grid for shaping their online and offline practices in and with mobile media. Consid-ering both mobile device use in the public realm and the dissemina-tion of increasingly private content in social media (which is said to lead to ‘blurred boundaries’ between the private and the public), the paper provides a brief overview of the main developments in mobile media research: Studies adopting various approaches – e. g. socio-logical-ethnographic, linguistic and media studies – illustrate how publicness, privacy and anonymity are actively shaped and brought about by mobile media users in face-to-face and remote social en-counters. As this shows that publicness, privacy and anonymity are still relevant concepts for users, future media linguistics studies should focus on the dynamic multimodal practices by which they are contextualized and accomplished.

How much “tourism” is there in dictionary apps? An empirical study of lexicographical resources on mobile devices (German, Italian, Spanish) (2019)

Flinz, Carolina ; Egido Vicente, Maria

Konjunktiv I im gesprochenen Deutsch (2019)

Antonioli, Giorgio

This paper aims at investigating the usage of present subjunctive (Konjunktiv I), which is traditionally labelled as a feature of standard written language and therefore as typically occurring in communication genres based on it such as press texts and reporting, in everyday spoken German. Through an analysis of corpus data performed according to theory and method of Interactional Linguistics and encompassing private, institutional and public interactional domains, the paper will show how this particular verb form expresses different epistemic stances according to its syntactic embedment.

Towards a gold standard corpus for detecting valencies of Zulu verbs (2019)

Faaß, Getrud ; Bosch, Sonja

We report on a new project building a Natural Language Processing resource for Zulu by making use of resources already available. Combining tagging results with the results of morphological analysis semi-automatically, we expect to reduce the amount of manual work when generating a finely-grained gold standard corpus usable for training a tagger. From the tagged corpus, we plan to extract verb-argument pairs with the aim of compiling a verb valency lexicon for Zulu.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

137 search hits