Refine
Year of publication
Document Type
- Article (189) (remove)
Has Fulltext
- yes (189) (remove)
Keywords
- Korpus <Linguistik> (189) (remove)
Publicationstate
- Veröffentlichungsversion (91)
- Zweitveröffentlichung (45)
- Postprint (19)
- Ahead of Print (1)
Reviewstate
- Peer-Review (109)
- (Verlags)-Lektorat (32)
- Peer-review (5)
- Peer-Revied (4)
- Verlags-Lektorat (2)
- Peer-reviewed (1)
- Review-Status-unbekannt (1)
Publisher
- Leibniz-Institut für Deutsche Sprache (IDS) (11)
- de Gruyter (11)
- Institut für Deutsche Sprache (10)
- Erich Schmidt (9)
- Universitäts- und Landesbibliothek Darmstadt (7)
- Editura Academiei Române (5)
- Gesellschaft für Sprachtechnologie und Computerlinguistik (5)
- Verlag für Gesprächsforschung (5)
- Springer Nature (4)
- Benjamins (3)
To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
This report presents a corpus of articulations recorded with Schlieren photography, a recording technique to visualize aeroflow dynamics for two purposes. First, as a means to investigate aerodynamic processes during speech production without any obstruction of the lips and the nose. Second, to provide material for lecturers of phonetics to illustrates these aerodynamic processes. Speech production was recorded with 10 kHz frame rate for statistical video analyses. Downsampled videos (500 Hz) were uplodad to a youtube channel for illustrative purposes. Preliminary analyses demonstrate potential in applying Schlieren photography in research.
In this paper, an exploratory data-driven method is presented that extracts word-types from diachronic corpora that have undergone the most pronounced change in frequency of occurrence in a given period of time. Combined with statistical methods from time series analysis, the method is able to find meaningful patterns and relationships in diachronic corpora, an idea that is still uncommon in linguistics. This indicates that the approach can facilitate an improved understanding of diachronic processes.
One of the fundamental questions about human language is whether all languages are equally complex. Here, we approach this question from an information-theoretic perspective. We present a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6500 different documents as represented in 41 multilingual text collections consisting of ~ 3.5 billion words or ~ 9.0 billion characters and covering 2069 different languages that are spoken as a native language by more than 90% of the world population. We statistically infer the entropy of each language model as an index of what we call average prediction complexity. We compare complexity rankings across corpora and show that a language that tends to be more complex than another language in one corpus also tends to be more complex in another corpus. In addition, we show that speaker population size predicts entropy. We argue that both results constitute evidence against the equi-complexity hypothesis from an information-theoretic perspective.
Classical null hypothesis significance tests are not appropriate in corpus linguistics, because the randomness assumption underlying these testing procedures is not fulfilled. Nevertheless, there are numerous scenarios where it would be beneficial to have some kind of test in order to judge the relevance of a result (e.g. a difference between two corpora) by answering the question whether the attribute of interest is pronounced enough to warrant the conclusion that it is substantial and not due to chance. In this paper, I outline such a test.
In this paper, we present an overview of freely available web applications providing online access to spoken language corpora. We explore and discuss various solutions with which the corpus providers and corpus platform developers address the needs of researchers who are working with spoken language. The paper aims to contribute to the long-overdue exchange and discussion of methods and best practices in the design of online access to spoken language corpora.
Im Beitrag werden ausgewählte semantische und syntaktische Eigenschaften von AcI-Konstruktionen bei Wahrnehmungsverben im Deutschen, Italienischen und Ungarischen anhand einer Korpusanalyse dargestellt. Dabei wird in erster Linie auf Eigenschaften eingegangen, denen in der bisherigen Forschung wenig Aufmerksamkeit gewidmet wurde. Das Hauptziel ist, syntaktische Eigenschaften der Konstruktion aufzudecken, die sich von den Eigenschaften von Sätzen mit einer weniger markierten syntaktischen Struktur unterscheiden. Des Weiteren wird auch auf den Grammatikalisierungsgrad der Konstruktion in den einzelnen Vergleichssprachen eingegangen.
This paper investigates the use of linking adverbs in adversative constructions in German and Italian. In Italian those constructions are very frequently formulated with adverbs such as invece, while wordings without a lexical connective are more typical of German. Corpus data show that the syntactic und semantic conditions favouring the use of adversative adverbs are by and large the same in both languages. Lexical connectives can increase explicitness when the intended adversative interpretation is not obvious on other grounds. The higher frequency of adversative adverbs in Italian is shown to be a consequence of the more restrictive rules of the placement of prosodic accent.