Refine
Year of publication
- 2015 (318) (remove)
Document Type
- Part of a Book (137)
- Article (77)
- Conference Proceeding (39)
- Book (37)
- Part of Periodical (10)
- Working Paper (8)
- Other (7)
- Master's Thesis (1)
- Preprint (1)
- Review (1)
Keywords
- Deutsch (116)
- Korpus <Linguistik> (52)
- Verb (21)
- Gesprochene Sprache (16)
- Interaktion (14)
- Wörterbuch (14)
- Computerlinguistik (13)
- Computerunterstützte Lexikographie (13)
- Englisch (13)
- Annotation (12)
Publicationstate
- Veröffentlichungsversion (141)
- Zweitveröffentlichung (20)
- Postprint (13)
- Preprint (2)
- Erstveröffentlichung (1)
Reviewstate
Publisher
- Institut für Deutsche Sprache (50)
- De Gruyter (32)
- de Gruyter (25)
- Lang (12)
- Narr Francke Attempto (11)
- Narr (10)
- Springer (7)
- Winter (7)
- Frank & Timme (4)
- IDS (4)
In my article I argue the need for an existence of grammar in spoken language. It would have the same functions as the grammar of written language: describing and explaining the fundamental units of spoken language and their features, describing the composition of those units and their conjunction. The basic units in the grammar of spoken language can be named as: the sound, the word, the functional unit, the conversational turn and the conversation itself. Further the central characteristics of spoken language and their impact on grammar have to be taken into account. They are: the interactivity, the multimodality, the processabihty and the great variability. After displaying my concepts I discuss three alternative concepts of a grammar in spoken language: online-syntax, construction grammar and multimodal grammar. The article concludes by discussing the role of spoken language grammar in language and foreign language teaching.
In this contribution, we report on an effort to annotate German data with information relevant to opinion inference. Such information has previously been referred to as effect or couched in terms of eventevaluation functors. We extend the theory and present an extensive scheme that combines both approaches and thus extends the set of inference-relevant predicates. Using these guidelines to annotate 726 German synsets, we achieve good inter-annotator agreement.
Some 25 years ago, a large-scale repatriation of Russian Germans began. As a result, more than 2,5 million people that grew up in the USSR, Russia, or other post-Soviet states, became German citizens who had native or near-native command of the Russian language. The uncomfortable differences they exhibited in comparison to those who were supposed to accept them as equals, yet failed to do so, compelled them to search for self-designations that would accommodate their new identity and to bond together to form a new minority. The authors examine the attempts of Soviet/Russian Germans to redefine their ethnic identity in terms of not just blood but also language and culture, focusing on two particular cases: the use of the name Rusak in the internet forums of the repatriated immigrants; and the linguistic-cultural practices of the older generation of immigrants.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of all linguistic theories that take social interaction involving language into account. However, determining communicative functions is a notoriously difficult task both for human interpreters and systems. It involves an interpretative process that integrates various sources of information. Existing work on communicative function classification comes from either dialogue act tagging where it is generally coarse grained concerning the feed- back phenomena or it is token-based and does not address the variety of forms that feed- back utterances can take. This paper introduces an annotation framework, the dataset and the related annotation campaign (involving 7 raters to annotate nearly 6000 utterances). We present its evaluation not merely in terms of inter-rater agreement but also in terms of usability of the resulting reference dataset both from a linguistic research perspective and from a more applicative viewpoint.
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
Prosodic constructions used to compete for the speaking turn in conversation have been widely studied (French & Local (1983), Kurtić et al. (2013)). Usually, turn competition arises in overlapping talk between at least two speakers. Coordination between participants in their prosodic design of talk (Szczepek-Reed, 2006) and social action (Gorisch et al. 2012), as well as entrainment in more general terms (Levitan et al. 2011), is well established in the literature. Nevertheless, previous studies on turn competition and overlap do not investigate the prosodic design of turn competitive incomings in reference to the orientation of the speakers to each other. Rather, they assume that prosodic constructions are used for turn competition regardless of the co-participants’ design of the turn. In this paper, we ask whether the prosodic design of turn competitive talk is co-constructed between two participants talking in overlap. More specifically, we investigate whether the prosodic design of one participant’s in overlap talk is developed with respect to the interlocutor’s prosodic features during the same portion of overlapped talk, and whether this prosodic matching can discriminate between the overlaps that are competitive and those that are not. 183 Our analyses are based on two-speaker overlaps drawn from a corpus of multi-party face-to face conversation between four friends recorded in British English (Kurtic et al. 2012). 3407 instances of twospeaker overlaps have been extracted from 4 hours of talk. Two independent conversation analysts performed the interactional categorisation of overlaps into competitive and non-competitive for all these two-speaker overlap instances and achieved a good agreement of alpha=0.807 (Krippendorff 2004) as measured on a subset of 808 overlaps selected for our initial analysis. For the analysis of prosodic features we focus on F0 related features: mean, slope, span and contour, all of which have previously been shown to be used by each overlapping speaker separately for turn competition (Kurtic et al. 2009; Oertel et al. 2012). We investigate the similarity in F0 mean, slope and span by correlating these features across the two participants. For F0 contour, a similarity coefficient is computed using dynamic programming method described in Gorisch et al. (2012). We consider the difference in F0 contour similarity in competitive and non-competitive overlaps as an indication of intonational matching being a turn competitive resource. We conduct these analyses for overlaps that are clearly competitive or noncompetitive as indicated by inter-annotator agreement. In addition, we qualitatively explore those cases that annotators disagree on in order to investigate whether they reveal further important interactional or prosodic features of in-overlap talk. Our preliminary results suggest that conversational participants attend and adapt to the interlocutor during overlap depending on whether they return competition or not. We explain our findings in relation to previous work on turn competition in overlap, discuss the quantitative method employed and also address the possible consequences of our results for the study of prosodic realization of other social actions in conversation.
Scales and Scores. An evaluation of methods to determine the intensity of subjective expressions
(2015)
In this contribution, we present a survey of several methods that have been applied to the ordering of various types of subjective expressions (e.g. good < great), in particular adjectives and adverbs. Some of these methods use linguistic regularities that can be observed in large text corpora while others rely on external grounding in metadata, in particular the star ratings associated with product reviews. We discuss why these methods do not work uniformly across all types of expressions. We also present the first application of some of these methods to the intensity ordering of nouns (e.g. moron < dummy).
Precise multimodal studies require precise synchronisation between audio and video signals. However, raw audio and audio from video recordings can be out of sync for several reasons. In order to re-synchronise them, a dynamic programming (DP) approach is presented here. Traditionally, DP is performed on the rectangular distance matrix comparing each value in signal A with each value in signal B. Previous work limited the search space using for example the Sakoe Chiba Band (Sakoe and Chiba, 1978). However, the overall space of the distance matrix remains identical. Here, a tunnel matrix and its according DP-algorithm are presented. The matrix contains merely the computed distance of two signals to a pre-specified bandwidth and the computational cost is equally reduced. An example implementation demonstrates the functionality on artificial data and on data from real audio and video recordings.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of linguistic theories that take social interaction, involving language, into account. This paper introduces the corpora and datasets of a project scrutinizing this kind of feedback utterances in French. We present the genesis of the corpora (for a total of about 16 hours of transcribed and phone force-aligned speech) involved in the project. We introduce the resulting datasets and discuss how they are being used in on-going work with focus on the form-function relationship of conversational feedback. All the corpora created and the datasets produced in the framework of this project will be made available for research purposes.
The present study introduces articulography, the measurement of the position of tongue and lips during speech, as a promising method to the study of dialect variation. By using generalized additive modeling to analyze articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation across dozens of speakers. Our results on the basis of Dutch dialect data show clear differences between the southern and the northern dialect with respect to tongue position, with a more frontal tongue position in the dialect from Ubbergen (in the southern half of the Netherlands) than in the dialect of Ter Apel (in the northern half of the Netherlands). Thus articulography appears to be a suitable tool to investigate structural differences in pronunciation at the dialect level.
This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue). The overall high κ between 0.76 and 0.89 (depending on the speech style) shows that the DIMA conventions can be applied successfully.
Ph@ttSessionz and Deutsch heute are two large German speech databases. They were created for different purposes: Ph@ttSessionz to test Internet-based recordings and to adapt speech recognizers to the voices of adolescent speakers, Deutsch heute to document regional variation of German. The databases differ in their recording technique, the selection of recording locations and speakers, elicitation mode, and data processing.
In this paper, we outline how the recordings were performed, how the data was processed and annotated, and how the two databases were imported into a single relational database system. We present acoustical measurements on the digit items of both databases. Our results confirm that the elicitation technique affects the speech produced, that f0 is quite comparable despite different recording procedures, and that large speech technology databases with suitable metadata may well be used for the analysis of regional variation of speech.
The effect of manipulation of a speaker’s voice as well as exposure to a native speaker’s utterance was investigated regarding the pronunciation of stops by German learners of French. Three subject groups, a Control (CG), a Manipulation (MG), and a Native Speaker (NG) Group, were recorded on two subsequent days. The MG was presented with a manipulation of their voice on the second day and the NG listened to a native French speaker, while the CG did not receive any feedback. Results show that speakers of the MG and NG were able to extract useful information from the respective feedback and successfully adapted to it. Participants were able to reduce their voice onset time values, although speakers of the NG reduced it to a greater extent.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
This study examines the pitch profiles of French learners of German and German learners of French, both in their native language (L1), and in their respective foreign language (L2). Results of the analysis of 84 speakers suggest that for short read sentences, French and German speakers do not show pitch range differences in their native production. Furthermore, analyses of mean f0 and pitch range indicate that range is not necessarily reduced in L2 productions. These results are different from results reported in prior research. Possible reasons for these differences are discussed.
We investigated the effect of high-variability training (HVT) on the production and perception of French bilabial voiced and voiceless stops by German native speakers. Stop consonants in the two languages differ with respect to several articulatory and acoustic features. German learners of French (Experiment Group) trained the perception of word-initial bilabial stops spoken by six French native speakers using identification tests, whereas subjects of a Control Group did not perform a training. Additional perception and production tests of French words including bilabial, alveolar, and velar stops in all word positions were performed to capture the impact of HVT. Subjects were found to be quite good at distinguishing voiced and voiceless stops. However, voiceless stops received lower correctness scores than voiced ones and subjects of the Experiment group were able to further increase their scores after training. Results for production are mirror-inverted showing that subjects of the Experiment Group successfully produced longer negative VOT values but did not show an improvement for voiceless stops.
We present an approach for opinion role induction for verbal predicates. Our model rests on the assumption that opinion verbs can be divided into three different types where each type is associated with a characteristic mapping between semantic roles and opinion holders and targets. In several experiments, we demonstrate the relevance of those three categories for the task. We show that verbs can easily be categorized with semi-supervised graphbased clustering and some appropriate similarity metric. The seeds are obtained through linguistic diagnostics. We evaluate our approach against a new manually-compiled opinion role lexicon and perform in-context classification.
There is an increasing number of dictionary types and lexical search-tools designed to respond to an ever-growing array of user needs. The quest for innovation, however, is not over and this is what this book shall shed light on: the identification of dictionary types that have never been developed for certain languages or for a given lexical domain, as well as typological and linguistic problems that may compromise the development of lexicographic projects.
Cybermobbing ist der gezielte Versuch, online das Face einer anderen Person zu dekonstruieren. Etwa ein Drittel aller Jugendlichen ist schon mindestens einmal mit diesem Problem konfrontiert worden. Seinen temporären Höhepunkt erreichte es mit dem Erscheinen der Internetseite Isharegossip.com (ISG). Diese entwickelte sich sehr schnell zu einer regelrechten Mobbing-Plattform. Täter fanden hier ganz besonders drastische verbale Mittel, um ihre Opfer zu kompromittieren. Bislang wurde noch nicht qualitativ analysiert, inwieweit Opfer und sogenannte virtuelle Zaungäste auf diese Verbalattacken reagieren. Ziel des Aufsatzes ist es, anhand eines typischen Diskurses sechs Verteidigungsstrategien aufzuzeigen, die von Opfern aber auch von sogenannten virtuellen Zaungästen angewandt werden, um das Face des Opfers zu rekonstruieren und zu stabilisieren.
Der Beitrag beschäftigt sich mit der lexikografischen Information von fünf DaF-Lernerwörterbüchern und fokussiert in besonderem Maße das verbale Kombinationspotenzial. Die vorgelegte Analyse legt dabei besonderen Wert auf die grammatische Syntagmatik bei Verben, zu deren Beschreibung acht Analyseparameter dienen. Die Resultate werden ausführlich kommentiert und in einer Tabelle schematisch zusammengefasst. Im Ergebnis werden Informationslücken in verschiedenen Bereichen aufgedeckt und daraus neue Herausforderungen für die einsprachige DaF-Lernerlexikografie abgeleitet.
This paper presents some theoretical and methodological foundations of the research project DICONALE, which concerns the development of an online dictionary of verbal lexemes with a special conceptual-onomasiological access and a paradigmatic structure in response to studies which have shown that conventional dictionaries (both monolingual and bilingual), do not satisfy the specific needs of users involved in the production of texts in foreign language.
Zur Gestaltung künftiger Lernerwörterbücher im DaF-Bereich ist es notwendig, die Bedürfnisse und Recherchegewohnheiten der potenziellen Benutzerinnen und Benutzer zu kennen. Seit dem virtuellen Medienwechsel erfährt die Wörterbuchbenutzungsforschung wichtige neue Impulse. Speziell im DaF Bereich liegen aber bis jetzt nur vereinzelt aktuelle empirische Daten über die unterschiedlichen Benutzergewohnheiten der Lernerinnen und Lerner vor, die für zukünftige lexikographische Konsultationssysteme ausgewertet und berücksichtigt werden könnten. Aus diesem Grunde wurde im Rahmen des Forschungsprojekts DICONALE, welches die Erstellung eines konzeptuellonomasiologisch orientierten zweisprachig bilateralen online-Produktionslernerwörterbuches für Verben und deverbale Wortarten des Deutschen und Spanischen anvisiert, eine Umfrage konzipiert, die von Lernenden des Deutschen als Fremdsprache in Spanien, Portugal und Deutschland beantwortet wurde. Im Mittelpunkt dieser Umfrage steht sowohl das Ziel, die Benutzergewohnheiten der DaF-Lernenden unterschiedlicher Sprachstufen im universitären und außeruniversitären Bereich kennenzulernen, als auch die Gründe für mögliche fehlgeschlagene Recherchen zu erforschen und Hinweise auf die Wünsche und Bedürfnisse der Lernenden entsprechend zu interpretieren. Ziel des Beitrages ist es daher, einerseits die wichtigsten Ergebnisse der Umfrage vorzustellen und andererseits sowohl allgemeine Rückschlüsse auf die Konzipierung zukünftiger Lernerwörterbücher für den DaF-Bereich zu ziehen als auch konkrete Anforderungen an DICONALE herauszuarbeiten.
In recent years, theoretical and computational linguistics has paid much attention to linguistic items that form scales. In NLP, much research has focused on ordering adjectives by intensity (tiny < small). Here, we address the task of automatically ordering English adverbs by their intensifying or diminishing effect on adjectives (e.g. extremely small < very small). We experiment with 4 different methods: 1) using the association strength between adverbs and adjectives; 2) exploiting scalar patterns (such as not only X but Y); 3) using the metadata of product reviews; 4) clustering. The method that performs best is based on the use of metadata and ranks adverbs by their scaling factor relative to unmodified adjectives.
Pogled u e-leksikografiju
(2015)
U radu se daje pregled temeljnih pojmova i klasifikacija u području e-leksikografije. Donosi se klasifikacija e-rječnika, prikazuje se leksikografski proces izrade e-rječnika te pregled najraširenijih sustava za izradu rječnika (DWS) i sustava za pretragu korpusa (CQS). Kao primjer dobre prakse detaljnije se opisuje mrežni rječnik elexiko (Institut za njemački jezik u Mannheimu): prikazuju se njegovi ciljevi i namjena, teorijske i metodološke postavke, moduli te mogućnosti uporabe. Kao moguća osnova za izradu korpusno utemeljenoga e-rječnika hrvatskoga jezika koji bi bio u skladu s najrecentnijim leksikografskim (i uopće lingvističkim) teorijama i praksama prikazuje se rad na mrežnome leksičko-semantičkome repozitoriju hrvatskoga jezika (baza semantičkih okvira, predodžbenih shema, kognitivnih primitiva i leksičkih jedinica) u okviru projekta Repozitorij metafora hrvatskoga jezika.
The article analyses data from a corpus of email-correspondence and chat protocols that describe the initial steps of romantic contacts. It shows that different types of silences are used strategically in the process of people getting to know each other. Five silence strategies within conversations are described and their functions are illustrated by typical examples.
In diesem Aufsatz werden Positionierungsverfahren analysiert, welche die Macher einer Talkshow einsetzen, um ihre Gäste den Fernsehzuschauern als relevante Gesprächspartner für das Thema „Steuerhinterziehung durch Prominente” zu präsentieren. Es wird untersucht, wie es den Machern der Talkshow gelingt, die Gäste bereits bei der Erstvorstellung durch das Zusammenspiel einer Stimme aus dem Off und der Kameraführung als „prototypische Vertreter” zu präsentieren und zueinander zu positionieren. Von den insgesamt fünf Teilnehmern der Talkshow werden zwei dieser Erstvorstellungen detailliert analysiert. Es handelt sich um die Präsentation zweier Gäste, die in einer deutlich antagonistischen Beziehung zueinander stehen. Diese Gäste werden unmittelbar hintereinander vorgestellt. Auf der Grundlage aller fünf Gastpräsentationen, die wir detailliert rekonstruiert haben, jedoch aus Platzgründen hier leider nicht ebenfalls präsentieren können, wird ein strukturiertes Positionierungsgeflecht deutlich. Dieses Geflecht weist im Zentrum die von uns rekonstruierte thematische und personelle „Gegnerschaft“ auf. In der Peripherie sind dann insgesamt vier Vertreter relevanter gesellschaftlicher Positionen zum Thema der Talkshow beigeordnet. Dabei handelt es sich um Vertreter der Rechtsprechung, der Politik, der Alltagsmoral und der Psychologie und Theologie. Die Analysen werden in theoretischer Hinsicht auf der Grundlage multimodaler Vorstellungen zur Positionierung und zum Recipient Design durchgeführt. In methodisch-methodologischer Perspektive orientiert sich die Analyse an der multimodalen Interaktionsanalyse.
Zur Ko-Konstruktion einer amüsanten Unterbrechung während einer argumentativen Auseinandersetzung
(2015)
This article is concerned with the choice of a corpus to be used as the empirical basis of a bilingual, bidirectional and conceptual learner dictionary of German and Spanish. Several standard corpora as well as web corpora for German and Spanish will be compared with respect to their size, the variety of genres they contain, the time span and geographical areas covered and what kind of search facilities they allow (e.g. word queries based on lemmata rather than on word forms). It will be argued that, when standard corpora fail to meet a particular requirement, web data may provide a useful alternative for lexicographical purposes provided they are both linguistically (i.e. morpho-syntactically) and meta-linguistically tagged.
Ein integriertes Datenbank-, Such- und Tagging-Tool (IDaSTo) wird vorgestellt, das sich besonders für Variablenanalysen, für Paralleltexte und für diachronische Untersuchungen eignet. Relevante Kategorien bzw. Variablen können individuell definiert, Tags frei im Text und auf verschiedenen Wegen gesetzt und ihre Häufigkeiten in den verlinkten Statistiken direkt abgerufen werden.
Der vorliegende Aufsatz befasst sich mit der Verbreitung des Lexems Nerd in der deutschen Sprache. Untersucht wurde die DeReKo-Datenbank hinsichtlich der Frequenz des Wortes und der ko-textuellen Umgebungen. Diese Daten wurden verglichen mit einem Korpus aus möglichen Übersetzungen des Lexems, das sich aus US-amerikanischen Serien zusammensetzt (,Scrubs‘, ,The Big Bang Theory‘, ,Family Guy‘ und ,American Dad‘). Aus der Synopse der gewonnenen Erkenntnisse und der sprachhistorischen Analyse des Lexems kann abgeleitet werden, dass Synchronfassungen den zeitgenössischen Sprachgebrauch widerspiegeln und daher auch steter Quell für Sprachwandel sind. Bezogen auf das Lexem Nerd ist der Schluss zu ziehen, dass dieses den Status eines assimilierten Fremdwortes erreicht hat und lediglich die Adjektivierung noch nicht vollständig integriert ist. Eine Übersetzung mit deutschen Lexemen erscheint in diesem Zusammenhang nicht sinnvoll.
El análisis de las obras lexicográficas existentes en espanol y en alemán para L2 confirma la necesidad de desarrollar un diccionario pedagógico con una nueva concepción, desde la cual el usuario pueda obtener una mayor información adaptada a sus necesidades. De especial relevancia es el tratamiento de la información específica con el que se puedan mejorar los procesos de producción de textos en L2 mediante un procedimiento que, teniendo en cuenta la diversidad de uso, permita seleccionar un lexema particular de la variedad de posibles lexicalizaciones. En esta contribución se presentan los aspectos teóricos y metodológicos que sustentan el proyecto DICONALE-online. Los cuatro pilares del proyecto en torno al tipo de usuarios, al enfoque conceptual y onomasiológico, a la base empírica de los datos y al modelo de descripción enlazado con un punto de vista contrastivo configuran nuevos retos para el desarrollo de la lexicografía pedagógica que se expondrán en este trabajo a partir de algunos ejemplos.
Einleitung
(2015)