Refine
Year of publication
- 2016 (347) (remove)
Document Type
- Part of a Book (136)
- Article (104)
- Conference Proceeding (51)
- Book (33)
- Part of Periodical (12)
- Working Paper (5)
- Doctoral Thesis (3)
- Other (2)
- Preprint (1)
Keywords
- Deutsch (113)
- Korpus <Linguistik> (47)
- Gesprochene Sprache (31)
- Konversationsanalyse (24)
- Wörterbuch (22)
- Interaktion (20)
- Computerunterstützte Lexikographie (19)
- Linguistik (17)
- Diskursanalyse (16)
- Kommunikation (15)
Publicationstate
- Veröffentlichungsversion (169)
- Zweitveröffentlichung (35)
- Postprint (17)
- Erstveröffentlichung (1)
Reviewstate
Publisher
- Institut für Deutsche Sprache (45)
- de Gruyter (34)
- De Gruyter (23)
- Winter (19)
- European Language Resources Association (ELRA) (13)
- Narr Francke Attempto (12)
- Retorika (8)
- Peter Lang (7)
- Linssen Druckcenter (6)
- Association for Computational Linguistics (5)
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German
(2016)
Research has shown that language learners are not only challenged by segmental differences between their native language (L1) and the second language (L2). They also have problems with the correct production of suprasegmental structures, like phone/syllable duration and the realization of pitch. These difficulties often lead to a perceptible foreign accent. This study investigates the influence of prosody transplantation on foreign accent ratings. Syllable duration and pitch contour were transferred from utterances of a male and female German native speaker to utterances of ten French native speakers speaking German. Acoustic measurements show that French learners spoke with a significantly lower speaking rate. As expected, results of a perception experiment judging the accentedness of 1) German native utterances, 2) unmanipulated and 3) manipulated utterances of French learners of German suggest that the transplantation of the prosodic features syllable duration and pitch leads to a decrease in accentedness rating. These findings confirm results found in similar studies investigating prosody transplantation with different L1 and L2 and provide a beneficial technique for (computer-assisted) pronunciation training.
The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo.
The aim of this study is to select and formulate criteria for the assessment of tools and exercises that are using computer-assisted pronunciation training (CAPT). We examined ten different CAPT tools selected on the basis of an informal questionnaire among 10 colleagues working in a German-French CAPT project. Although the applied assessment must still be regarded as informal, and although the selected CAPT tools might not be an optimal sample for representing the state of the art, the results clearly show that there is a lot to improve regarding the clarity of instruction, the quality of exercises, the robustness of the diagnosis, the clarity and appropriateness of scoring, the diversity of feedback methods, the assumed benefit for various types of users as well as the usage of ASR. Despite various good approaches regarding graphics and game-like exercises there are obviously missing links between the pedagogical expertise in phonetic training on the one hand, and software development including usability engineering on the other.
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-native Speech
(2016)
Phonatory behavior of German speakers (GS) and French speakers (FS) in native (L1) and non-native (L2) speech was instrumentally examined. Vowel productions of the two groups were analyzed using a parametrization of phonatory behaviour and phonatory quality properties in the acoustic signal. The behavior of GS is characterized by more strained adduction of the vocal folds whereas FS show more incomplete glottal closure. Furthermore, GS change their phonatory behavior in the foreign language (=French) by adapting phonatory strategies of FS, whereas FS do not show this tendency. In addition, German beginners (BEG) and partly German advanced learners (ADV) are already orientated on production characteristics of the L2. French BEG however retain their phonatory behavior in L2 (=German) by showing less vocal fold adduction in comparison to their L1. French ADV show the opposite behavior. Finally, ADV of the two speaker groups generally show more strained behavior in L2 productions than BEG. The results provide evidence that GS and FS apply different laryngeal phonatory settings and that they altered their settings in L2 differently. Perceptual evaluation of voice quality of the speech material and a correlation analysis between acoustic and perceptual results are suggested for future research.
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.
Gegenstand dieser Arbeit sind syntagmatische Verwendungsmuster in einsprachigen deutschen Wörterbüchern. Es wurden zehn einsprachige deutsche Wörterbücher (darunter Allgemeine Bedeutungswörterbücher, Lernerwörterbücher und auf Syntagmen fokussierte Spezialwörterbücher) und die darin befindlichen syntagmatischen Verwendungsmuster untersucht. Dabei wurde der Frage nachgegangen, wie in einsprachigen deutschen Wörterbüchern der syntagmatische Kontext eines Wortes berücksichtigt und in lexikografischer Hinsicht umgesetzt wird. Die typografischen Besonderheiten von jedem untersuchten Werk – gedruckt wie online publiziert – wurden herausgearbeitet. Dies wurde anhand von Syntagmen aus 30 Wortartikeln, die den Wortarten Nomen, Verben und Adjektive zugeordnet sind, systematisch beleuchtet.
Sentence and construction types generally have more than one pragmatic function. Impersonal deontic declaratives such as ‘it is necessary to X’ assert the existence of an obligation or necessity without tying it to any particular individual. This family of statements can accomplish a range of functions, including getting another person to act, explaining or justifying the speaker’s own behavior as he or she undertakes to do something, or even justifying the speaker’s behavior while simultaneously getting another person to help. How is an impersonal deontic declarative fit for these different functions? And how do people know which function it has in a given context? The authors address these questions using video recordings of everyday interactions among speakers of Italian and Polish.
When translating narrative texts from French into German, translators mostly choose the German simple tense “Präteritum” as an equivalent for French simple tenses and the German perfect tense “Plusquamperfekt” as an equivalent for French perfect tenses. There are common cases
however when the translator expresses anteriority where French is underspecified. On the other hand, sometimes the translator (or the editor) decides not to express anteriority by a verb tense
even if there is a perfect tense in the French source text. This is the surprising result of this study based on a small corpus of contemporary novel translations.
Menschenrechte für Wörter
(2016)
Lexikalisch-semantische Graduonymie. Eine empirisch basierte Arbeit zur lexikalischen Semantik
(2016)
Diese Arbeit befasst sich mit der Problematik gradueller Bedeutungsbeziehungen in der Sprache. Sie verfolgt das Ziel, die aufgrund der graduellen Opposition in Paradigmen formierten Wörter als eigenständigen Relationstyp der lexikalischen Semantik zu unterscheiden, ihn theoretisch herauszuarbeiten und empirisch zu fundieren. Diese Relation wird analog der terminologischen Tradition der "-nymie"-Relationen als Graduonymie bezeichnet. Mit verschiedenen empirischen Methoden wie der webbasierten Sprecherbefragung, Korpusanalysen, systematischen Tests und Kontrastierung mit dem Usbekischen werden die Validität und Stabilität der Daten überprüft und somit Erkenntnisse zum Phänomen der Graduonymie gewonnen. Dies bildet den Kernpunkt der Untersuchung. Dabei werden unterschiedliche Aspekte der Graduonymie betrachtet und analysiert. Der Vergleich der Methoden eröffnet neue Perspektiven auf die semantischen Relationen, die Vorgehensweise hat sich methodisch als erfolgreich erwiesen. Die Ergebnisse der Arbeit erbringen interessante Einsichten nicht nur in den Phänomenbereich der Graduonymie, sondern ergänzen den aktuellen Stand der lexikalischen Semantik sowohl in theoretischer Hinsicht als auch durch die methodenpluralistische Behandlung semantischer Relationen.
Gottfried Wilhelm Leibniz plädiert in seinen der deutschen Sprache gewidmeten Schriften „Unvorgreiffliche Gedancken“ (1697) sowie „Ermahnung an die Teutsche“ (1682) für den konsequenten Ausbau des Deutschen zu einer nationalen als auch internationalen Wissenschaftssprache. Eines seiner Hauptargumente ist dabei die Möglichkeit einer Teilhabe aller Gesellschaftsschichten am wissenschaftlichen Diskurs im Interesse einer Steigerung der allgemeinen Wohlfahrt. Der Aufsatz untersucht einerseits Leibniz’ Argumente für den Gebrauch, die Entwicklung und die Verbreitung der Wissenschaftssprache Deutsch und spannt andererseits einen Bogen zur gegenwärtigen spiegelbildlichen Debatte, die Leibniz führte, des Deutschen als Wissenschaftssprache und seiner Rolle im internationalen wissenschaftlichen Diskursraum und diskutiert bereits offensichtliche wie mögliche Folgen der aktuellen Entwicklung.
Wörterverzeichnis
(2016)
Vorwort
(2016)
Leibnizʼ Interesse an sprachlichen Fragen steht in unterschiedlichen Kontexten. So geht es ihm bei der Beschäftigung mit dem Deutschen um die Möglichkeit das theoretische Wissen an die Praxis und die Praktiker einer aufgeklärt modernen Gesellschaft heranzubringen. Bei der Beschäftigung mit der Entwicklung einer auf der klassischen Wissenschaftssprache Latein basierenden, aber vereinfacht-internationalisierten wissenschaftlichen Universalsprache ebenfalls darum, aber auch um eine übereinzelsprachliche Internationalisierung. Bei seinen abstrakteren universalsprachlichen Überlegungen leitet ihn das Interesse an einer möglichen Universalität der auszudrückenden Relationen – wie in einer mathematischen Modellierung – wie an der Frage möglicherweise universaler Bestandteile des einzelsprachlich („monadisch“) gebrochenen Blicks auf die Welt. Im Hinblick auf beide Aspekte dieser dritten Ebene stellte die chinesische Sprache als altes und im Vergleich zur europäischen Sprachenwelt alternatives Kodierungsmodell eine probate Möglichkeit zur Schärfung seiner eigenen Überlegungen und Konzepte dar.
When becoming integrated into the German vocabulary, foreign words reflect paradigmatic changes regarding orthography, grammar as well as semantics. In this context,German orthography is also highly determined by orthographic codification, which continues to influence the development of spelling to the present day. This study compares digital linguistically annotated corpora containing texts written by professional as well as non-professional writers; these corpora contain several billion foreign words (of Greek, Latin and French origin, and in the second part of the study of English/American and Italian origin), studied over a period of 20 years following the German orthographic reform of 1996. The results may potentially help the official regulations to adapt to the spelling practices observed – either by describing the rules more precisely or by proposing possible spelling variants or eliminating those which are not in common use. The study may also help to support correct lexicographic codification in dictionaries.
The present investigation targets the phenomenon commonly called control. Many languages including German and Polish employ non-finite clauses (besides finite clauses) as propositional complements. The subject of these complement clauses is left unexpressed and must generally be interpreted co-referentially with the subject or object of the matrix clause (subject or object control). However. there are also infinitive-selecting verbs that do not allow for a co- referential interpretation of the embedded subject - semantically, the embedded infinitives of these anti-control verbs are thus less dependent on or less unifiable with the matrix proposition. In Polish anti-control constructions, non-finite complements are overtly marked with the complementizer zeby, suggesting that they are structurally more complex (namely. containing a C-projection) than the non-finite complements in control constructions lacking zeby (modulo special contexts. viz. 'control switch'). In a comparative perspective, the paper brings corpuslinguistic and experimental evidence to bear on the question whether surface appearances notwithstanding, the infinitival complements of anti-control verbs in German should similarly be analyzed as truly sentential, i.e., C-headed structures.
Status und Gebrauch des Niederdeutschen 2016. Erste Ergebnisse einer repräsentativen Erhebung
(2016)
Wer versteht heute Plattdeutsch, und wer spricht es? Wer nutzt die plattdeutschen Medien- und Kulturangebote? Welche Vorstellungen verbinden die Menschen in Norddeutschland mit dem Niederdeutschen, und wie stehen sie zu ihrer Regionalsprache? Diesen und weiteren Fragen widmet sich die vorliegende Broschüre mithilfe von repräsentativen Daten, die durch eine telefonische Befragung von insgesamt 1.632 Personen aus acht Bundesländern (Bremen, Hamburg, Mecklenburg-Vorpommern, Niedersachsen, Schleswig-Holstein sowie Brandenburg, Nordrhein-Westfalen und Sachsen-Anhalt) gewonnen wurden.
Sprache ist in der Psychotherapie nicht nur Verständigungsmittel, sondern zugleich diagnostisches und therapeutisches Instrument, und therapeutische Fragen sind dabei ein zentraler Handlungstyp. Welche Typen von Fragen vorkommen und welche Funktionen sie für das diagnostische und therapeutische Handeln haben, ist hier Gegenstand einer linguistisch-gesprächsanalytischen Untersuchung. Den Forschungskontext bildet eine Kooperation von Psychotherapeuten und Linguisten zur Weiterentwicklung von Theorie und Praxis der psychotherapeutischen Anamneseerhebung.
Cet article étudie les définitions en contexte d’instructions dans les leçons d’auto-école. Les observations s’appuient sur un corpus de 70 heures de leçons enregistrées par vidéo en Allemagne. Le moniteur utilise des définitions pour introduire des nouvelles expressions techniques qui sont étroitement liées aux buts de l’apprentissage de conduite. Pour leur production, l’emploi des ressources multimodales est fondamental. La définition ostensive par pointage et une assertion existentielle (ça/ici c’est X) est complétée par des définitions descriptives et des démonstrations gestuelles du maniement des objets. L’objectif des actes de définition ici n’est pas de délivrer une définition de l’expression en soi, qui soit valable pour tous les contextes possibles, mais de produire une définition qui soit efficace dans le contexte pratique concerné. Les définitions donc sont plutôt fragmentées, indexicales et situées, et elles sont adaptées aux pré-connaissances de l’interlocuteur.
Aktuelle Änderungen des Rats für deutsche Rechtschreibung 2016 - Hintergründe und Begründungen
(2016)
This paper attempts a critique of the notion of 'dialogue' in dialogue theory as espoused by Linell, Markova, and others building on Bakhtin’s writings. According to them, human communication, culture, language, and even cognition are dialogical in nature. This implies that these domains work by principles of other-orientation and interaction. In our paper, we reject accepting other-orientation as an a priori condition of every semiotic action. Instead, we claim that in order to be an empirically useful concept for the social sciences, it must be shown if and how observable action is other-oriented. This leads us to the following questions: how can we methodically account for other-orientation of semiotic action? Does other-orientation always imply interaction? Is every human expression oriented towards others? How does the other, as s/he is represented in semiotic action, relate to the properties which the other can be seen to exhibit as indexed by their observable behavior? We study these questions by asking how the orientation towards others becomes evident in different forms of communication. For this concern, we introduce ‘recipient design’, ‘positioning’ and ‘intersubjectivity’ as concepts which allow us to inquire how semiotic action both takes the other into account and, reflexively, shapes him/her as an addressee having certain properties. We then specifically focus on actions and situations in which other-orientation is particularly problematic, such as interactions with children, animals, machines, or communication with unknown recipients via mass media. These borderline cases are scrutinized in order to delineate both limits and constitutive properties of other-orientation. We show that there are varieties of meaningful actions which do not exhibit an orientation towards the other, which do not rest on (the possibility of) interaction with the other or which even disregard what their producer can be taken to know about the other. Available knowledge about the other may be ignored in order to reach interactional goals, e. g. in strategical interactions or for concerns of socialization. If semiotic action is otherorientated, its design depends on how the other is available to and matters for their producer. Other-orientation may build on shared biographical experiences with the other, knowledge about the other as an individual and close attention to their situated conduct. However, other-orientation may also rest on (stereo-)typification with respect to institutional roles or group membership. In any case, others as they are represented in semiotic action can never be just others-as-such, but only othersas-perceived-by-the-actor. We conclude that the strong emphasis which dialogue theories put on otherorientation obscures that other-orientation is neither universal in semiotic action, that it must be distinguished from an interactive relationship, and that the ways in which the other figures in semiotic actions is not homogeneous in any of its most general properties. Instead, there is a huge variation in the ways in which the other can be taken into account. Therefore close scrutiny of how the other precisely figures in a certain kind of semiotic action is needed in order to lend the concept of ‘other-orientation’ empirical substance and a definite sense.
'Faction' im Fernsehen - Produktionsbeobachtung des Scripted Reality-Formats mieten, kaufen, wohnen
(2016)
It is widely assumed that there is a natural, prelinguistic conceptual domain of time whose linguistic organization is universally structured via metaphoric mapping from the lexicon and grammar of space and motion. We challenge this assumption on the basis of our research on the Amondawa (Tupi Kawahib) language and culture of Amazonia. Using both observational data and structured field linguistic tasks, we show that linguistic space-time mapping at the constructional level is not a feature of the Amondawa language, and is not employed by Amondawa speakers (when speaking Amondawa). Amondawa does not recruit its extensive inventory of terms and constructions for spatial motion and location to express temporal relations. Amondawa also lacks a numerically based calendric system. To account for these data, and in opposition to a Universal Space-Time Mapping Hypothesis, we propose a Mediated Mapping Hypothesis, which accords causal importance to the numerical and artefact-based construction of time-based (as opposed to event-based) time interval systems.
Präposition-Substantiv-Verbindungen mit rekurrentem Nullartikel in adverbialer Verwendung – z.B. nach Belieben, auf Knopfdruck, ohne Ende oder bei Nacht – sind ein in der Mehrwortforschung bisher eher vernachlässigter Typ. Sie sind Untersuchungsgegenstand des laufenden Forschungsprojekts „Präpositionale Wortverbindungen kontrastiv“ (beteiligte Institutionen: IDS Mannheim, Universität Santiago de Compostela, Universität Trnava), in das wir in unserem Vortrag einen Einblick vermitteln. Es wird skizziert, wie sich solche Wortverbindungen sowie abstraktere präpositionale Wortverbindungsmuster vom Typ [in + SUBX-Zeit(en) (z.B. in Echtzeit, in Krisenzeiten) aus kontrastiver Sicht (Deutsch – Spanisch – Slowakisch) korpusbasiert untersuchen und lexikografisch beschreiben lassen. Von großem Interesse – gerade auch für Fremdsprachenlerner – sind dabei insbesondere die semantisch-funktionalen Restriktionen, denen solche Entitäten unterliegen. Basierend auf den theoretischen und empirischen Grundannahmen des am IDS entwickelten Modells „Usuelle Wortverbindungen“ (vgl. Steyer 2013) werden im Projekt zunächst Kollokations- und Kotextmuster für die binären deutschen Mehrworteinheiten induktiv in sehr großen Korpora ermittelt; im Anschluss werden sie einem systematischen Vergleich mit dem Spanischen und Slowakischen unterzogen. Methodisch greifen wir – in allen drei Sprachen – u.a. auf Kookkurrenzprofile zu den Wortverbindungen sowie auf Slotanalysen zu definierten Suchmustern zurück. Ziel des Projekts ist u.a. die Entwicklung eines neuartigen Prototyps für eine multilinguale Aufbereitung des Untersuchungsgegentands (speziell für Fremdsprachenlerner).
The author presents a study using eye-tracking-while-reading data from participants reading German jurisdictional texts. I am particularly interested in nominalisations. It can be shown that nominalisations are read significantly longer than other nouns and that this effect is quite strong. Furthermore, the results suggest that nouns are read faster in reformulated texts. In the reformulations, nominalisations were transformed into verbal structures. Reformulations did not lead to increased processing times of verbal constructions but reformulated texts were read faster overall. Where appropriate, results are compared to a previous study of Hansen et al. (2006) using the same texts but other methodology and statistical analysis.
Current theories of the syntax-semantics interface associate aspects of meaning that cannot be traced to visible structure with empty projecting heads or constructions as wholes. We present an alternative compositional analysis of the hidden aspectual-temporal, modal or comparative meaning of inchoative, middle, excessive and directional complement constructions. Accord-ingly, the hidden meaning results from a repair mechanism that passes on a locally problematic meaning component to the next higher derivational cycle. The meaning component in question is one half of the logical form of Difference as contributed by certain functional elements or by syntactically transitive (nominative-accusative) configurations.
Kookkurrenzen (zum Beispiel ‘Beziehungen pflegen’ oder ‘wirtschaftlich bankrott’) gehören zum zentralen Gegenstand jeder korpusanalytischen Studie. Als Wortverbindungen sind sie Einheiten, die unter bestimmten kontextuellen Voraussetzungen zustande kommen und die wichtige Funktionen im Syntagma, Satz oder Text aufweisen. Kookkurrenzen stellen den systematischen Zugang zur Erfassung von Bedeutung, Funktionen sowie von konventionalisierten Mustern dar. Ihre Relevanz wird auch zunehmend in kultur- und politikwissenschaftlich und in kognitiv orientierten Wissenschaftsbereichen anerkannt.
Mit diesem Band wird Fachliteratur zu zentralen Bereichen und Themen zusammengefasst, bei denen korpusanalytische Verfahren zur Untersuchung typischer Wortkombinationen im Mittelpunkt stehen. Dazu zählen neben Überblicksliteratur und allgemeinen Einführungen auch interessante Einzelstudien, die mit diversen Korpusansätzen arbeiten, sowie weiterführende Links und Materialsammlungen. Dieser Band bildet insbesondere die Themenschwerpunkte ab, die gegenwärtig viel Aufmerksamkeit erhalten.
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
(2016)
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.
Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand des Beispiels "sprich" als Diskursmarker bzw. Reformulierungsindikator Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.
Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand vier verschiedener Beispiele Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.
Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand des Beispiels metapragmatischer Modalisierungen mit den Adverbien "sozusagen" und "gewissermaßen" und mit der Formel "in Anführungszeichen/-strichen" Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.
Wiktionary is increasingly gaining influence in a wide variety of linguistic fields such as NLP and lexicography, and has great potential to become a serious competitor for publisher-based and academic dictionaries. However, little is known about the "crowd" that is responsible for the content of Wiktionary. In this article, we want to shed some light on selected questions concerning large-scale cooperative work in online dictionaries. To this end, we use quantitative analyses of the complete edit history files of the English and German Wiktionary language editions. Concerning the distribution of revisions over users, we show that — compared to the overall user base — only very few authors are responsible for the vast majority of revisions in the two Wiktionary editions. In the next step, we compare this distribution to the distribution of revisions over all the articles. The articles are subsequently analysed in terms of rigour and diversity, typical revision patterns through time, and novelty (the time since the last revision). We close with an examination of the relationship between corpus frequencies of headwords in articles, the number of article visits, and the number of revisions made to articles.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.
Overview of the IGGSA 2016 Shared Task on Source and Target Extraction from Political Speeches
(2016)
We present the second iteration of IGGSA’s Shared Task on Sentiment Analysis for German. It resumes the STEPS task of IGGSA’s 2014 evaluation campaign: Source, Subjective Expression and Target Extraction from Political Speeches. As before, the task is focused on fine-grained sentiment analysis, extracting sources and targets with their associated subjective expressions from a corpus of speeches given in the Swiss parliament. The second iteration exhibits some differences, however; mainly the use of an adjudicated gold standard and the availability of training data. The shared task had 2 participants submitting 7 runs for the full task and 3 runs for each of the subtasks. We evaluate the results and compare them to the baselines provided by the previous iteration. The shared task homepage can be found at http://iggsasharedtask2016.github.io/.
We examine different features and classifiers for the categorization of opinion words into actor and speaker view. To our knowledge, this is the first comprehensive work to address sentiment views on the word level taking into consideration opinion verbs, nouns and adjectives. We consider many high-level features requiring only few labeled training data. A detailed feature analysis produces linguistic insights into the nature of sentiment views. We also examine how far global constraints between different opinion words help to increase classification performance. Finally, we show that our (prior) word-level annotation correlates with contextual sentiment views.
We present an approach to the new task of opinion holder and target extraction on opinion compounds. Opinion compounds (e.g. user rating or victim support) are noun compounds whose head is an opinion noun. We do not only examine features known to be effective for noun compound analysis, such as paraphrases and semantic classes of heads and modifiers, but also propose novel features tailored to this new task. Among them, we examine paraphrases that jointly consider holders and targets, a verb detour in which noun heads are replaced by related verbs, a global head constraint allowing inferencing between different compounds, and the categorization of the sentiment view that the head conveys.
The wdlpOst dictionary writing system to be presented in this paper has been developed for the specific purposes of a lexicographical project on German loanwords in the East Slavic languages Russian, Belarusian, and Ukrainian. The project’s main objectives are (i) to document those loanwords for which a cognate lexical borrowing from German is known in Polish and (ii) to establish possible borrowing pathways for these lexical items. In the first phase of the project, the collaborative client/server architecture of the wdlpOst system has been used for excerpting detailed lexicographical information from a large range of historical and contemporary East Slavic dictionaries, taking the entries in a large dictionary of German loanwords in Polish as a common frame of reference. For the project’s second phase, the wdlpOst system provides innovative tooling for compiling entries of the East Slavic loanwords. Most importantly, the numerous word sense definitions for a set of cognate loanwords, as excerpted from different lexicographical sources, are mapped onto a system of newly defined cross-language word senses; in a similar vein, the phonemic and graphemic variation in the loanwords and their derivatives is captured through a tool that abstracts from dictionary-specific idiosyncrasies.
Lexicography of Language Contact: An Internet Dictionary of Words of German Origin in Tok Pisin
(2016)
The paper presents an ongoing project in the domain of lexicography of language contact, namely, the “Internet Dictionary of Words of German Origin in Tok Pisin”. The German influence onto the lexicon of the main pidgin language of Papua New Guinea has its roots in the German colonial empire, where Tok Pisin played an important role as a lingua franca in the colony of German New Guinea. Tok Pisin also served as an intermediate language for many borrowing processes; that is, German loans entered many languages in the South Pacific via Tok Pisin. The Internet Dictionary of Words of German Origin in Tok Pisin is based on all available lexicographical sources from the early 20th century up to now. These sources are systematically evaluated within our project; the results will be documented in the dictionary. The microstructure of the dictionary will be presented with respect to its major features: documentation of sources, examples for word usage, audio files, and lexicographic comment.
The Online Bibliography of Electronic Lexicography (OBELEXmeta) is a bibliographic database which is developed for researchers working in the field of dictionary research. The platform is hosted at the Institute for the German Language (Institut für Deutsche Sprache, IDS) in Mannheim. The poster presentation aims at presenting the current status of the ongoing project.
The Shared Task on Source and Target Extraction from Political Speeches (STEPS) first ran in 2014 and is organized by the Interest Group on German Sentiment Analysis (IGGSA). This volume presents the proceedings of the workshop of the second iteration of the shared task. The workshop was held at KONVENS 2016 at Ruhr-University Bochum on September 22, 2016.
There is increasing interest in recognizing opinion inferences in addition to expressions of explicit sentiment. While different formalisms for representing inferential mechanisms are being developed and lexical resources are being built alongside, we here address the need for deeper investigation of the robustness of various aspects of opinion inference, performing crowdsourcing experiments with constructed stimuli as well as a corpus study of attested data.
Sentiment analysis has so far focused on the detection of explicit opinions. However, of late implicit opinions have received broader attention, the key idea being that the evaluation of an event type by a speaker depends on how the participants in the event are valued and how the event itself affects the participants. We present an annotation scheme for adding relevant information, couched in terms of so-called effect functors, to German lexical items. Our scheme synthesizes and extends previous proposals. We report on an inter-annotator agreement study. We also present results of a crowdsourcing experiment to test the utility of some known and some new functors for opinion inference where, unlike in previous work, subjects are asked to reason from event evaluation to participant evaluation.
"Kaum [...] da, wird' ich gedisst!" Funktionale Aspekte des Banter-Prinzips auf dem Online-Prüfstand
(2016)
The article is to be considered as an attempt to enrich the theoretical approach of the Banter-Principle (Leech 1983) with an online point of view. Examples from Teamspeak- conversations and comments on the social network site Facebook reveal different user practices regarding the identifiability of the Banter-Principle: Nonverbal elements or emoticons in order to make sure that Banter is understood correctly in written language on the one hand; coping with assigned roles depending on dynamic group internal hierarchies in oral communication on the other hand. Nevertheless one question remains. Why should one disguise a cordial message rudely? My analysis shows two functions of Online Banter. Firstly, maximize the entertainment value of a conversation and secondly, establish an accepted online-identity.
We present an empirical study addressing the question whether, and to which extent, lexicographic writing aids improve text revision results. German university students were asked to optimise two German texts using (1) no aids at all, (2) highlighted problems, or (3) highlighted problems accompanied by lexicographic resources that could be used to solve the specific problems. We found that participants from the third group corrected the largest number of problems and introduced the fewest semantic distortions during revision. Also, they reached the highest overall score and were most efficient (as measured in points per time). The second group with highlighted problems lies between the two other groups in almost every measure we analysed. We discuss these findings in the scope of intelligent writing environments, the effectiveness of writing aids in practical usage situations and teaching dictionary skills.
Der Begriff der „Gattung“ wird in der Soziologie und der Sprachwissenschaft als Sammelbegriff für verfestigte, (sprachlich) ähnliche Muster mit repetitiver Frequenz zur Lösung verwandter kommunikativer Probleme gefasst (z.B. unterschiedliche moralische Gattungen, vgl. Bergmann/Luckmann (Hg.) 1999). Wenig Aufmerksamkeit wurde bislang den Gemeinsamkeiten und Unterschieden – also den Abgrenzungsmöglichkeiten – von prototypischen zu weniger prototypischen Vertretern einzelner Gattungsfamilien zuteil. Im vorliegenden Beitrag beschreiben wir anhand von authentischen Daten die sogenannten „Gassigespräche“ als spontane Kommunikation des Alltags von Hundebesitzer/innen. Außerhalb der Sprachwissenschaft werden diese primär als Hyponym des Hyperonyms „Small Talk“ subsumiert. Wir versuchen zunächst unter gattungsanalytischen Gesichtspunkten die obligatorischen und fakultativen Einheiten um ein – sofern es denn überhaupt existiert – prototypisches Zentrum von Small-Talk zu gruppieren. Anhand eines paradigmatischen Falls beschreiben wir Gemeinsamkeiten und Unterschiede in Bezug auf andere Gattungen, die sich im Spektrum der Alltagsgespräche – oder auch darüber hinaus – ansiedeln. Wir plädieren in der Diskussion dafür, Gattungsfamilien als mehr oder weniger verfestigte Muster mit teils wiederkehrenden Merkmalen zu sehen, die ihre Eigenschaften in Form und Funktion teilen können.
Die Mensch-Tier-Interaktion wird aus linguistischer Perspektive bislang hauptsächlich im Bereich des phatic talk angesiedelt. Meist werden ihr Funktionen zur Kontroll- oder Aufmerksamkeitssicherung des Hundes (Mitchell 2001) zugeschrieben. Als soziale Praxis innerhalb alltäglicher spontaner Kurzgespräche zwischen HundehalterInnen bietet die Mensch-Hund-Interaktion jedoch ein Repertoire innerhalb des kommunikativen Haushalts, mit dem spezifische, rekurrent auftretende kommunikative Aufgaben gelöst werden können. Dieser Beitrag betrachtet unter gesprächsanalytischen Gesichtspunkten dieses funktionale Spektrum mit besonderem Fokus auf das Adressierungsverhalten. Zunächst wird der bisherige Forschungsstand zu Adressierungsverhalten in natürlichen Gesprächen sowie zur Mensch-Tier- Interaktion beleuchtet. Anschließend werden konkrete Interaktionssequenzen innerhalb von Gassigesprächen analysiert, um herauszuarbeiten, welche interaktiven Funktionen das Sprechen mit dem Tier haben kann.
Beim Kontakt der substandardsprachlichen deutschen Varietäten, die von Aussiedlern der Einwanderungsgeneration aus deutschen Sprachinseln der ehemaligen Sowjetunion mitgebracht wurden, mit der Standardsprache und den binnendeutschen Regionalvarietäten ergeben sich Veränderungen spezifischer Art, wie sie im deutschsprachigen Raum bei einheimischen Dialektsprechern bei der Konvergenz infolge von Standard/Dialekt-Variation nicht vorliegen. Wenn Sprecher aus einer Sprachinsel kommen, dann aktivieren sie im Laufe des Aufenthaltes in Deutschland ihre Variationsmuster auf Grund der dialektalen Vorkenntnisse des Deutschen und weiten ihr Repertoire in den standardsprachlichen und zum Teil auch regionalsprachlichen Bereich des Deutschen aus. Diesem Prozess und seinen Folgen ist die vorliegende Publikation gewidmet.
Bild-Makros, auch unter dem Begriff Memes bekannt, sind populäre Internetphänomene, die im Zuge der umfassenden Multimodalisierung der Medienkommunikation als Unterhaltungsangebote auf Facebook verbreitet und kommentiert werden. Dieser Beitrag betrachtet diese aus einer Kombination von Bild und Text bestehenden multimodalen Kommunikate aus einer gattungs- und gesprächsanalytischen Perspektive, da Bild- Makros sowohl in ihrer formalen und semantischen Gestaltung als auch in der interaktiven Rezeption in Form von Kommentaren und Antworten verfestigte Muster aufzuweisen scheinen. In dieser medial vermittelten Interaktion haben sich sowohl auf der strukturellen Ebene der Interaktionssequenzen als auch innerhalb einzelner, auf sequenzexterner und sequenzinterner Ebene analysierten Interaktionseinheiten verschiedene kommunikative Muster herausgebildet. Darin nehmen soziale Prozesse wie face-work und Identitätskonstruktion Einfluss auf die interaktive Aushandlung des Kommunikats.
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
Verbformen
(2016)
Von Gastarbeitern zu Transmigranten. Sprachliche Variation in deutsch-türkischen Lebenswelten
(2016)
Wörterbuch der Pfälzer am Niederrhein. Einschließlich einer Sammlung von Redensarten uff pälzersch
(2016)
This thesis investigates temporal and aspectual reference in the typologically unrelated African languages Hausa (Chadic, Afro–Asiatic) and Medumba (Grassfields Bantu). It argues that Hausa is a genuinely tenseless language and compares the interpretation of temporally unmarked sentences in Hausa to that of morphologically tenseless sentences in Medumba, where tense marking is optional and graded. The empirical behavior of the optional temporal morphemes in Medumba motivates an analysis as existential quantifiers over times and thus provides new evidence suggesting that languages vary in whether their (past) tense is pronominal or quantificational (see also Sharvit 2014). The thesis proposes for both Hausa and Medumba that the alleged future tense marker is a modal element that obligatorily combines with a prospective future shifter (which is covert in Medumba). Cross-linguistic variation in whether or not a future marker is compatible with non-future interpretation is proposed to be predictable from the aspectual architecture of the given language.
TripleA is a workshop series founded by linguists from the University of Tübingen and the University of Potsdam. Its aim is to provide a forum for semanticists doing fieldwork on understudied languages, and its focus is on languages from Africa, Asia, Australia and Oceania. The second TripleA workshop was held at the University of Potsdam, June 3-5, 2015.