Refine
Year of publication
- 2016 (8) (remove)
Document Type
- Article (8) (remove)
Has Fulltext
- yes (8)
Is part of the Bibliography
- no (8)
Keywords
- Korpus <Linguistik> (8) (remove)
Publicationstate
Reviewstate
- Peer-Review (6)
- Peer-Revied (1)
Publisher
Editorial
(2016)
This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.
Dieser Beitrag stellt nach einer kurzen allgemeinen Einführung die Datenbank für Gesprochenes Deutsch (DGD) und das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente speziell für gesprächsanalytisches Arbeiten vor. Anhand des Beispiels sprich als Diskursmarker für Reformulierungen werden Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen illustriert: Nutzungsmöglichkeiten der Token-, Kontext-, Metadaten- und Positionssuche werden gezeigt, jeweils in Bezug auf und im wechselseitigen Verhältnis mit qualitativen Fallanalysen, auch mit Belegannotationen nach analyserelevanten (strukturellen und funktionalen) Kategorien. Schließlich wird das heißt als weiterer Reformulierungsindikator für eine vergleichende Analyse herangezogen. Dieser Beitrag stellt eine detailliertere Ausarbeitung einer kürzeren, eher technisch-didaktischen Online-Handreichung (Kaiser/ Schmidt 2016) zu diesem Thema dar, und hat einen stärker inhaltlich-analytischen Fokus.
When becoming integrated into the German vocabulary, foreign words reflect paradigmatic changes regarding orthography, grammar as well as semantics. In this context,German orthography is also highly determined by orthographic codification, which continues to influence the development of spelling to the present day. This study compares digital linguistically annotated corpora containing texts written by professional as well as non-professional writers; these corpora contain several billion foreign words (of Greek, Latin and French origin, and in the second part of the study of English/American and Italian origin), studied over a period of 20 years following the German orthographic reform of 1996. The results may potentially help the official regulations to adapt to the spelling practices observed – either by describing the rules more precisely or by proposing possible spelling variants or eliminating those which are not in common use. The study may also help to support correct lexicographic codification in dictionaries.
The paper presents practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German, a large collection of spontaneous verbal interaction from diverse discourse domains. After introducing the aims and organisational circumstances of the construction of FOLK, the general idea discussed is that good practices cannot be developed without considering methodological, technological and organisational aspects on equal footing. Starting from this idea, this paper inspects more closely some actual practices in FOLK, namely the handling of legal (especially privacy protection) issues, the decisions taken for the transcription and annotation workflow, and the question of how to best disseminate a corpus like FOLK. The final section sketches some possible future improvements for practices in FOLK.
Names in competition: A corpus-based quantitative investigation into the use of colonial place names
(2016)
Referentially equivalent toponyms occur very often in colonial and postcolonial contexts. These names are in competition, and this competition is reflected in language use and in changing frequencies of use in large corpora. The main theoretical and methodological assumption of this paper is that corpus frequencies of referentially equivalent toponyms change according to particular patterns, and that the Google Ngram Corpora and Google Ngram Viewers can be used to detect these patterns. The aims of this paper are twofold: firstly, a corpus-linguistic method for investigations into the use of names will be presented, applied, and critically evaluated; secondly, it will be shown that the correlation between patterns of frequency changes and patterns of socio-historical colonial and postcolonial events gives rise to cross-linguistic generalizations, for example, that an increase in public interest in a place strongly promotes one of the referenlially equivalent names, or that in renaming scenarios colonial toponyms in relation to new toponyms remain in stronger use in the language of the former colonial power than in languages of other colonial powers.