Refine
Year of publication
- 2015 (101) (remove)
Document Type
- Conference Proceeding (38)
- Part of a Book (33)
- Article (20)
- Book (5)
- Working Paper (2)
- Master's Thesis (1)
- Preprint (1)
- Review (1)
Language
- English (101) (remove)
Keywords
- Korpus <Linguistik> (34)
- Deutsch (21)
- Computerlinguistik (12)
- Annotation (11)
- Englisch (10)
- Automatische Sprachanalyse (6)
- Corpus annotation (6)
- Corpus linguistics (6)
- Corpus technology (6)
- Datenbanksystem (6)
Publicationstate
- Veröffentlichungsversion (49)
- Postprint (11)
- Zweitveröffentlichung (9)
- Preprint (2)
Reviewstate
Publisher
- Institut für Deutsche Sprache (9)
- Springer (7)
- Narr (6)
- De Gruyter (4)
- Benjamins (3)
- Frank & Timme (3)
- Palgrave Macmillan (3)
- de Gruyter (3)
- Association for Computational Linguistics (2)
- Elsevier (2)
The availability of electronic corpora of historical stages of languages has been wel- comed as possibly attenuating the inherent problem of diachronic linguistics, i.e. that we only have access to what has chanced to come down to us - the problem which was memorably named by Labov (1992) as one of “Bad Data”. However, such corpora can only give us access to an increased amount ot historical material and this can essentially still only be a partial and possibly distorted picture of the actual language at a particular period of history. Corpora can be improved by taking a more representative sample of extant texts if these are available (as they are in significant number for periods after the invention of printing). But, as examples from the recently compiled GerManC corpus of seventeenth and eighteenth century German show, the evidence from such corpora can still fail to yield definitive answers to our questions about earlier stages of a language. The data still require expert interpretation, and it is important to be realistic about what can legitimately be expected from an electronic historical corpus.
The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.
Optimality theory (henceforth OT) models natural language competence in terms of interactions of universal constraints, notably markedness and faithfulness constraints. This article illustrates some of the major advances in the understanding of word-formation phenomena originating from this theory, including the prosodic organization of morphologically complex words, neutralization patterns in derivational affixes, allomorphy, and infixation.
Recipient design is a key constituent of intersubjectivity in interaction. Recipient design of turns is informed by prior knowledge about and shared experience with recipients. Designing turns in order to be maximally effective for the particular recipient(s) is crucial for accomplishing intersubjectively coordinated action. This paper reports on a specific pragmatic structure of recipient design, i.e. counter-factual recipient design, and how it impinges on intersubjectivity in interaction. Based on an analysis of video-recordings data from driving school lessons in German, two kinds of counterfactual recipient design of instructors' requests are distinguished: pedagogic and egocentric turn-design. Counterfactual, pedagogic turn-design is used strategically to diagnose student skills and to create opportunities for corrective instructions. Egocentric turn-design rests on private, non-shared knowledge of the instructor. Egocentrically designed turns imply expectations of how to comply with requests which cannot be recovered by the student and which lead to a breakdown of intersubjective cooperation. This paper identifies practices, sources and interactional consequences of these two kinds of counterfactual recipient design. In addition, the study enhances our understanding of recipient design in at least three ways. It shows that recipient design does not only concern referential and descriptive practices, but also the indexing intelligible projections of next actions; it highlights the productive, other-positioning effects of recipient design; it argues that recipient design should be analyzed in terms of temporally extended interactional trajectories, linking turn-constructional practices to interactional histories and consecutive trajectories of joint action.
In this paper, general problems with easily confused words among a language community are addressed. Serving as an example, the difficulties of semantic differentiation between the use of German sensibel and sensitiv are discussed. One the one hand, the question is raised as to how a speech community faces challenges of semantic shifts and how monolingual dictionaries document lexical items with similar semantic aspects. On the other hand, I will demonstrate the discrepancies of information on meaning as retrieved and interpreted from large corpus data. It will be shown how the semantics of words change and hence cause confusion among speakers. As a result, empirical evidence opens up several questions concerning the prescriptive vs. descriptive treatment of paronymic items such as sensibel/sensitiv and it demands different approaches to the lexicographic description of such words in future reference works.
Using the Google Ngram Corpora for six different languages (including two varieties of English), a large-scale time series analysis is conducted. It is demonstrated that diachronic changes of the parameters of the Zipf–Mandelbrot law (and the parameter of the Zipf law, all estimated by maximum likelihood) can be used to quantify and visualize important aspects of linguistic change (as represented in the Google Ngram Corpora). The analysis also reveals that there are important cross-linguistic differences. It is argued that the Zipf–Mandelbrot parameters can be used as a first indicator of diachronic linguistic change, but more thorough analyses should make use of the full spectrum of different lexical, syntactical and stylometric measures to fully understand the factors that actually drive those changes.
Speakers’ linguistic experience is for the most part experience with language as used in conversational interaction. Though highly relevant for usage-based linguistics, the study of such data is as yet often left to other frameworks such as conversation analysis and interactional linguistics (Couper-Kuhlen and Selting 2001). On the basis of a case study of salient usage patterns of the two German motion verbs kommen and gehen in spontaneous conversation, the present paper argues for a methodological integration of quantitative corpus-linguistic methods with qualitative conversation analytic approaches to further the usage-based study of conversational interaction.
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use—both individually and collectively—over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
We investigated the effect of high-variability training (HVT) on the production and perception of French bilabial voiced and voiceless stops by German native speakers. Stop consonants in the two languages differ with respect to several articulatory and acoustic features. German learners of French (Experiment Group) trained the perception of word-initial bilabial stops spoken by six French native speakers using identification tests, whereas subjects of a Control Group did not perform a training. Additional perception and production tests of French words including bilabial, alveolar, and velar stops in all word positions were performed to capture the impact of HVT. Subjects were found to be quite good at distinguishing voiced and voiceless stops. However, voiceless stops received lower correctness scores than voiced ones and subjects of the Experiment group were able to further increase their scores after training. Results for production are mirror-inverted showing that subjects of the Experiment Group successfully produced longer negative VOT values but did not show an improvement for voiceless stops.