Refine
Year of publication
- 2016 (169) (remove)
Document Type
- Article (59)
- Conference Proceeding (44)
- Part of a Book (43)
- Book (15)
- Working Paper (5)
- Doctoral Thesis (3)
Keywords
- Deutsch (64)
- Korpus <Linguistik> (31)
- Gesprochene Sprache (20)
- Konversationsanalyse (11)
- Wörterbuch (11)
- Computerunterstützte Lexikographie (10)
- Französisch (8)
- German (7)
- Computerlinguistik (6)
- Linguistik (6)
Publicationstate
- Veröffentlichungsversion (169) (remove)
Reviewstate
Publisher
Unserdeutsch (Rabaul Creole German) ist nach heutigem Kenntnisstand die einzige deutschbasierte Kreolsprache der Welt. Sie entstand zu Beginn des 20. Jahrhunderts an einer katholischen Missionsstation in der damaligen Kolonie Deutsch-Neuguinea im melanesischen Pazifik. Die Sprache selbst und ihre Entstehungsumstände sind in mehrfacher Hinsicht bemerkenswert. Trotzdem wäre die Chance zur Dokumentation und Erforschung von Unserdeutsch beinahe verpasst worden: Nur noch rund 100 Sprecher, alle in fortgeschrittenem Alter, leben heute verstreut in Ostaustralien und Papua-Neuguinea.
Der hier vorliegende Blickpunkt informiert über die Entstehung, Bedeutung, Forschung und Aktualität der Sprache „Unserdeutsch“ im Pazifik.
У статті представлено розвиток девіатологічних ідей в Україні. Встановлено міждисциплінарний і спеціалізований підходи, запропоновано перспективи розвитку девіатологічних досліджень. Міждисциплінарний підхід знайшов відображення девіатологічних студій у філософії, психології, педагогіці, міжкультурній комунікації та журналістиці, а спеціалізований – у системно-теоретичному, когнітивному, дискурсно-текстовому, комунікативному, дидактичному і контрастивному напрямах досліджень. Така тенденція становить спробу комплексного комунікативнофункціонального підходу до явища девіацій.
Der Beitrag widmet sich dem Thema der kommunikativen Deviationen in Interviews im Ukrainischen und Deutschen. Dabei werden die Deviationen sowohl in den Presseinterviews als auch in den populärsten Videointerviews auf YouTube untersucht. Die Deviationen werden in die von der Position des Adressanten, des Adressaten sowie des Zuschauers aufgeteilt. Die Aufmerksamkeit wird der Sprach- und der kommunikativen Kompetenz der Kommunikanten als der Hauptursache der Deviationen in den Interviews gelenkt. Die Deviationen werden als eine der Voraussetzungen der erfolgreichen Kommunikation bestimmt.
Tollpatschig interviewen oder interviewt werden – Kurzvideos im ukrainischen und deutschen Fernsehen
(2016)
Kurzinterviews im Fernsehen stellen nicht nur für die kontrastive Medienlinguistik, sondern auch für die Gesprächsanalyse, Textsortenlinguistik und Pragmatik einen aufschlussreichen Gegenstand dar, besonders wenn es sich um kommunikative Abweichungen handelt. Der Beitrag stellt die Klassifizierung der Abweichungen bzw. der Deviationen in den Fernsehinterviews in Bezug auf die Kommunikation und die Sprache vor. Dabei werden die Kommunikationsdeviationen vom Standpunkt des Adressanten, des Kommunikationsprozesses, des gegenseitigen Verständnisses und des Adressaten sowie sprachliche Abweichungen betrachtet. Im Beitrag werden gemeinsame und unterschiedliche Merkmale der Deviationen in ukrainischen und deutschen Kurzinterviews im Fernsehen festgestellt, was zur Erarbeitung eines Modells der Deviationen und zu einer tieferen kontrastiven Untersuchung beider Sprachen verhilft.
Ob es um die Rechtschreibreform geht, um Anglizismen im Deutschen oder um den Umgang mit Migranten- oder Minderheitensprachen - Debatten und Meinungen zu Sprache(n) und Sprachformen sind Teil unseres Alltages. Dass Sprache auch Gegenstand der Politik ist, also Sprache und das Verhältnis von Sprachen in der Gesellschaft bewusst oder unbewusst gesteuert werden, wird dagegen in deutschsprachigen Kontexten eher selten thematisiert. Diese Einführung gibt einen Überblick über Ansätze, Praktiken, Theorien und Perspektiven auf wichtige Bereiche der Sprach(en)politik. Der erste Teil erläutert den theoretischen Hintergrund, der zweite Teil stellt eine Reihe von Ländern vor, die beispielhaft für wichtige Ansätze der sprachpolitischen Praxis stehen, aber auch nach ihrer Bedeutung für die größten philologischen Fächer (Germanistik, Anglistik, Romanistik) ausgewählt wurden. Damit liegt die erste systematische deutschsprachige Einführung in ein Thema vor, das international seit langem ein großes Maß an Aufmerksamkeit erhält. Sie richtet sich an Studierende und Lehrende sprachwissenschaftlicher Fächer und Nachbardisziplinen ebenso wie an Akteure der sprachpolitischen Praxis.
Dieser Beitrag fasst die wesentlichen Aussagen und Ergebnisse eines Workshops zusammen, der sieben Perspektiven auf die Untersuchung der Rolle des Deutschen im öffentlichen Raum zusammengebracht hat. Einige der vorgestellten Studien folgten dem seit Beginn der 2000er Jahre rasant an Popularität gewonnenen Ansatz der ‚Linguistic Landscapes‘. In anderen Beiträgen standen praktische Überlegungen zum Suchen von Beispielen der deutschen Sprache im Mittelpunkt, um diese im Kontext von DaF und Auslandsgermanistik sowie der Werbung für die deutsche Sprache einzusetzen. Ziel des Workshops war es, Gemeinsamkeiten und Perspektiven von diesen unter dem Schlagwort ‚Spot German‘ verorteten Studien mit der Linguistic Landscape-Tradition zu eruieren. Länder, aus denen Studien vorgestellt wurden, waren Estland, Lettland, Dänemark, Tschechien, Deutschland, Zypern und Malta.
Dieses Buch schließt eine Lücke in der Konnektorenforschung, indem es den Gebrauch von Konnektoren im gesprochenen Deutsch untersucht. Die Fragestellung bringt Elemente aus dem traditionellen grammatischen Ansatz und aus der pragmatisch basierten Forschung zur gesprochenen Sprache zusammen. In Anlehnung an die Methode der Interaktionalen Linguistik analysiert der Autor den Gebrauch der Konjunktoren «und», «aber» und der Adverbkonnektoren «also», «dann» in zwei Korpora von autobiographischen Interviews. Die Untersuchung zeigt, wie Konnektoren zur Bewältigung von verschiedenartigen kommunikativen Aufgaben zur Stiftung von Intersubjektivität und zur Gesprächsorganisation eingesetzt werden können.
Zum Geleit
(2016)
Dieser Band ist in mehrerlei Hinsicht außergewöhnlich. Einerseits ist er die diesjährige und damit 21. Ausgabe des seit 1994 erscheinenden Jahrbuches Triangulum und steht damit in der Tradition, der Germanistik im Baltikum ein Sprachrohr zu geben. Im Gegensatz zu früheren Jahren ist dieser Band jedoch noch viel mehr: Als Dokumentation des 10. Nordisch-Baltischen Germanistentreffens (NBGT), das vom 10. bis zum 13. Juni 2015 von der Germanistik der Universität Tallinn ausgerichtet wurde, bündelt er eine Vielzahl der Vorträge, die im Rahmen der Tagung gehalten wurden.
In conversation, interlocutors rarely leave long gaps between turns, suggesting that next speakers begin to plan their turns while listening to the previous speaker. The present experiment used analyses of speech onset latencies and eye-movements in a task-oriented dialogue paradigm to investigate when speakers start planning their responses. German speakers heard a confederate describe sets of objects in utterances that either ended in a noun [e.g., Ich habe eine Tür und ein Fahrrad (“I have a door and a bicycle”)] or a verb form [e.g., Ich habe eine Tür und ein Fahrrad besorgt (“I have gotten a door and a bicycle”)], while the presence or absence of the final verb either was or was not predictable from the preceding sentence structure. In response, participants had to name any unnamed objects they could see in their own displays with utterances such as Ich habe ein Ei (“I have an egg”). The results show that speakers begin to plan their turns as soon as sufficient information is available to do so, irrespective of further incoming words.
Comparaison de deux marqueurs d’affirmation dans des séquences de co-construction: voilà et genau
(2016)
This contribution investigates the German response particle genau and the French response particle voilà within collaborative turn sequences in videotaped ordinary conversations. Adopting a conversation analytic approach to cross-linguistic comparison, I will show that the basic epistemic value of both particles allows them to be used in similar sequential environments. When a co-participant formulates a candidate conclusion in environments where it can be easily inferred from previous talk, first speakers may confirm the adequacy of the pre-emptive completion by voilà or genau. These particles may then also be followed by self- or other-repeats. The analyses aim to illustrate that participants rely on a variety of practices in order to positively assess a pre-emptive completion, and to refute a supposed binary opposition of refusal vs. acceptance in the receipt slot.
The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.
The Component MetaData Infrastructure (CMDI) is the dominant framework for describing language resources according to ISO 24622 (ISO/TC 37/SC 4, 2015). Within the CLARIN world, CMDI has become a huge success. The Virtual Language Observatory (VLO) now holds over 800.000 resources, all described with CMDI-based metadata. With the metadata being harvested from about thirty centres, there is a considerable amount of heterogeneity in the data. In part, there is some use of controlled vocabularies to keep data heterogeneity in check, say when describing the type of a resource, or the country the resource is originating from. However, when CMDI data refers to the names of persons or organisations, strings are used in a rather uncontrolled manner. Here, the CMDI community can learn from libraries and archives who maintain standardised lists for all kinds of names. In this paper, we advocate the use of freely available authority files that support the unique identification of persons, organisations, and more. The systematic use of authority records enhances the quality of the metadata, hence improves the faceted browsing experience in the VLO, and also prepares the sharing of CMDI-based metadata with the data in library catalogues.
The Component MetaData Infrastructure (CMDI) provides a lego-brick framework for the creation, use and re-use of self-defined metadata formats. The design of CMDI can be a force forgood, but history shows that it has often been misunderstood or badly executed. Consequently,it has led the community towards the dark ages of metadata clutter rather than the bright side of semantic interoperability. In this abstract, we report on the condition of CMDI but also outlinean agenda to make the CMDI world a better place to use, share and profit from metadata.
This thesis investigates temporal and aspectual reference in the typologically unrelated African languages Hausa (Chadic, Afro–Asiatic) and Medumba (Grassfields Bantu).
It argues that Hausa is a genuinely tenseless language and compares the interpretation of temporally unmarked sentences in Hausa to that of morphologically tenseless sentences in Medumba, where tense marking is optional and graded.
The empirical behavior of the optional temporal morphemes in Medumba motivates an analysis as existential quantifiers over times and thus provides new evidence suggesting that languages vary in whether their (past) tense is pronominal or quantificational (see also Sharvit 2014).
The thesis proposes for both Hausa and Medumba that the alleged future tense marker is a modal element that obligatorily combines with a prospective future shifter (which is covert in Medumba). Cross-linguistic variation in whether or not a future marker is compatible with non-future interpretation is proposed to be predictable from the aspectual architecture of the given language.
TripleA is a workshop series founded by linguists from the University of Tübingen and the University of Potsdam. Its aim is to provide a forum for semanticists doing fieldwork on understudied languages, and its focus is on languages from Africa, Asia, Australia and Oceania. The second TripleA workshop was held at the University of Potsdam, June 3-5, 2015.
Stress that spills over into one's intimate relationship (Repetti, 1989) can increase negative behavior between partners (Repetti, 1989; Schulz et al., 2004), which in turn can negatively affect relationship outcomes, such as satisfaction (Karney and Bradbury, 1995; Randall and Bodenmann, 2016). This negative stress spillover process may, however, be mitigated if couples help each other cope with the experienced stress (i.e., dyadic coping). Although theoretical assumptions, such as the systematic-transactional model of stress and dyadic coping (Bodenmann, 2005), suggest that the association between coping behavior and relationship satisfaction is determined by cultural influences (e.g., gender roles), findings from a recent meta-analysis shows that this association is stable across nations and gender (Falconier et al., 2015). Despite the significant findings, the samples used in the meta-analysis nearly exclusively relied on couples living in Western culture (Falconier et al., 2015), which leaves an unanswered question about how culture may affect the association between dyadic coping and relationship satisfaction. The goal of the current paper was to examine the cultural influence in dyadic coping processes based on 7973 married individuals across 35 nations.
The Social Perception of Heroes and Murderers: Effects of Gender-Inclusive Language in Media Reports
(2016)
The way media depict women and men can reinforce or diminish gender stereotyping. Which part does language play in this context? Are roles perceived as more gender-balanced when feminine role nouns are used in addition to masculine ones? Research on gender-inclusive language shows that the use of feminine-masculine word pairs tends to increase the visibility of women in various social roles. For example, when speakers of German were asked to name their favorite “heroine or hero in a novel,” they listed more female characters than when asked to name their favorite “hero in a novel.” The research reported in this article examines how the use of gender-inclusive language in news reports affects readers’ own usage of such forms as well as their mental representation of women and men in the respective roles. In the main experiment, German participants (N = 256) read short reports about heroes or murderers which contained either masculine generics or gender-inclusive forms (feminine-masculine word pairs). Gender-inclusive forms enhanced participants’ own usage of gender-inclusive language and this resulted in more gender-balanced mental representations of these roles. Reading about “heroines and heroes” made participants assume a higher percentage of women among persons performing heroic acts than reading about “heroes” only, but there was no such effect for murderers. A post-test suggested that this might be due to a higher accessibility of female exemplars in the category heroes than in the category murderers. Importantly, the influence of gender-inclusive language on the perceived percentage of women in a role was mediated by speakers’ own usage of inclusive forms. This suggests that people who encounter gender-inclusive forms and are given an opportunity to use them, use them more themselves and in turn have more gender-balanced mental representations of social roles.
Status und Gebrauch des Niederdeutschen 2016. Erste Ergebnisse einer repräsentativen Erhebung
(2016)
Wer versteht heute Plattdeutsch, und wer spricht es? Wer nutzt die plattdeutschen Medien- und Kulturangebote? Welche Vorstellungen verbinden die Menschen in Norddeutschland mit dem Niederdeutschen, und wie stehen sie zu ihrer Regionalsprache?
Diesen und weiteren Fragen widmet sich die vorliegende Broschüre mithilfe von repräsentativen Daten, die durch eine telefonische Befragung von insgesamt 1.632 Personen aus acht Bundesländern (Bremen, Hamburg, Mecklenburg-Vorpommern, Niedersachsen, Schleswig-Holstein sowie Brandenburg, Nordrhein-Westfalen und Sachsen-Anhalt) gewonnen wurden.
Smiling individuals are usually perceived more favorably than non-smiling ones—they are judged as happier, more attractive, competent, and friendly. These seemingly clear and obvious consequences of smiling are assumed to be culturally universal, however most of the psychological research is carried out in WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic) and the influence of culture on social perception of nonverbal behavior is still understudied. Here we show that a smiling individual may be judged as less intelligent than the same non-smiling individual in cultures low on the GLOBE’s uncertainty avoidance dimension. Furthermore, we show that corruption at the societal level may undermine the prosocial perception of smiling—in societies with high corruption indicators, trust toward smiling individuals is reduced. This research fosters understanding of the cultural framework surrounding nonverbal communication processes and reveals that in some cultures smiling may lead to negative attributions.
Die Preußische Akademie der Wissenschaften zu Berlin hat im Jahr 1906 auf Bitte der deutschen Regierung die Verantwortung für die Arbeiten zur Vollendung des Deutschen Wörterbuchs von Jacob Grimm und Wilhelm Grimm übernommen. Im Jahr 1929/30 hat sie die Berliner Arbeitsstelle gegründet. Nach dem Zweiten Weltkrieg wurde dieses lexikographische Grundlagenwerk in den Jahrzehnten der Spaltung Deutschlands, aber in enger Gemeinschaft einer Berliner und einer Göttinger Arbeitsstelle zum Abschluss gebracht. Schon in den fünfziger Jahren entschlossen sich die Akademien in Berlin und Göttingen, „zunächst“ die völlige Neubearbeitung der ältesten Teile des Werks, die die Brüder Grimm zwischen 1852 und 1863 noch selbst erarbeitet hatten, vorzunehmen. Diese Neubearbeitung ist inzwischen nahezu abgeschlossen. Umso deutlicher zeigt sich aber nun, dass auch die übrigen Teile dringend der Neubearbeitung bedürfen. Das Jahrhundertwerk der Brüder Grimm, ihre wichtigste gemeinsame sprachwissenschaftliche Leistung, heute in der ganzen Welt täglich von Tausenden im Internet benutzt, Fundament der gesamten neueren deutschen Wortforschung, kann seine Aufgabe nur erfüllen, wenn es nicht als Museumsstück bewundert, sondern in gründlich erneuerter Form als aktuelles Auskunftsmittel fortgeführt wird. In dieser Situation war die Schließung der Berliner Arbeitsstelle im Dezember 2012 das falsche Signal.
Konnexion in argumentativen Texten. Gebrauchsunterschiede in Deutsch als L2 vs. Deutsch als L1
(2016)
Für die Kodierung interpropositionaler semantischer Relationen wie Additivität, Adversativität, Kausalität etc. steht im Deutschen wie in vielen anderen Sprachen ein reichhaltiges Inventar von Konnektoren unterschiedlicher syntaktischer Kategorien zur Verfügung. Einige semantische Relationen müssen jedoch nicht explizit kodiert werden, da sie auf der Basis übereinzelsprachlicher Erwartungen an „normale“ Sachverhaltszusammenhänge aus dem Kontext erschließbar sind. Ob diese Relationen dann auch von Schreibern ausbuchstabiert werden, ist einzelsprach-spezifisch unterschiedlich. Der Beitrag untersucht vor diesem Hintergrund die Kodierung interpropositionaler Relationen bei Lernern des Deutschen als Fremdsprache. Die Analyse eines Lernerkorpus mit Essays fortgeschrittener Deutschlerner aus Schweden, China und Weißrussland (KobaltDaF-Korpus) und eines muttersprachlichen Kontrollkorpus zeigt, dass Lerner von den Mustern der Muttersprachler quantitativ und qualitativ abweichen. Der Beitrag beschreibt diese Abweichungen und diskutiert mögliche Erklärungen.
The article investigates the ways in which organic-medical metaphors were used to set the boundary of discourse between the economy and politics. The successful establishment of organic-medical metaphors for the economy is mainly explained by their connectivity to different political views. Concepts such as ‘Wirtschaftsleben’ or perceptions of the economy as an ‘organism’ laid the foundation for diagnosing sick or healthy conditions. From the end of the 19th to beyond the mid-20th century typical statements illustrate that the use of such metaphors supported the naturalization and stabilization of the boundary-setting discourse, insofar as it seemed natural that the relation between the two spheres should be formulated in terms of health and disease. Within liberal economic discourse in particular, politics was on the one hand targeted as a potential cause for economic disease, while on the other, it was claimed that politics had the task of keeping economic forces healthy.
The following paper is aimed to demonstrate that a grammar from above („Grammatik von oben“), i.e. a top-down grammar is better suited for the purposes of contrastive linguistic descriptions than the contrary approach, i.e. a bottom-up grammar. Furthermore, it will be argued that sentences should be understood and explained from a textual point of view.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
Medialität und Sozialität sind grundlegende Kategorien einer medienlinguistischen Perspektive auf Sprache und Kommunikation und sollen im Folgenden die Ausgangspunkte einer Auseinandersetzung mit der Operativität digitaler Schriftzeichen bilden. Nach einer kurzen Einleitung wird dazu der Operativitätsbegriff erläutert und dieser dann anhand eines Postings im Microblog Twitter exemplifiziert.
Having found their way onto the computer screens, comics soon branched into webcomics. These kept a lot of the characteristics of print comic books, but gradually adapted new unexplored modes of representation. Three relatively new ‘enhancements’ to the medium of comics are presented in this article: webcomics enhanced through the use of the infinite canvas, as proposed by Scott McCloud, those enhanced with videos and/or sound, and lastly those enhanced with interactive and ludic elements. All of the mentioned push the medium of comics into new waters, and by doing so they add new layers of meaning and modify their structure based on the make-up of the implemented features. Infinite canvas manages to lift some limitations of print comics without changing the overall feel too drastically, while animated and voiced webcomics, as well as interactive or game comics, have a much higher inclination to transgress into domains of other media and transform themselves in order to accommodate and integrate these novel foreign features.
Co-development of action, conceptualization and social interaction mutually scaffold and support each other within a virtuous feedback cycle in the development of human language in children. Within this framework, the purpose of this article is to bring together diverse but complementary accounts of research methods that jointly contribute to our understanding of cognitive development and in particular, language acquisition in robots. Thus, we include research pertaining to developmental robotics, cognitive science, psychology, linguistics and neuroscience, as well as practical computer science and engineering. The different studies are not at this stage all connected into a cohesive whole; rather, they are presented to illuminate the need for multiple different approaches that complement each other in the pursuit of understanding cognitive development in robots. Extensive experiments involving the humanoid robot iCub are reported, while human learning relevant to developmental robotics has also contributed useful results.
Disparate approaches are brought together via common underlying design principles. Without claiming to model human language acquisition directly, we are nonetheless inspired by analogous development in humans and consequently, our investigations include the parallel co-development of action, conceptualization and social interaction. Though these different approaches need to ultimately be integrated into a coherent, unified body of knowledge, progress is currently also being made by pursuing individual methods.
Editorial
(2016)
This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.
Der Beitrag stellt ein interdisziplinär durchgeführtes Lehr-Lern-Projekt als Best-Practice-Beispiel vor. Ziel des vom Lehrinnovationspool der Universität Passau geförderten Projekts war es, Studierende der Sprachwissenschaft und Geographie sowie Schülerinnen und Schüler der FOS/BOS an digitales, selbstständiges und forschendes Lernen im thematischen Kontext der „Sprachdynamik im deutsch-österreichischen Grenzraum“ heranzuführen. Der Aufsatz zeigt, wie Studierenden verschiedene Rollen als Lernende, Forschende und auch als Lehrende einnehmen, indem sie die Schülerinnen und Schüler als Lernpaten bei der Planung, Durchführung und Auswertung von gemeinsamen Forschungsvorhaben unterstützen. Exemplarisch wird ein Projekt für Schülerinnen und Schüler näher vorgestellt. Weiterhin reflektiert der Beitrag das Lehrhandeln der Dozierenden.
Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim
(2016)
We present the IUCL system, based on supervised learning, for the shared task on stance detection. Our official submission, the random forest model, reaches a score of 63.60, and is ranked 6th out of 19 teams. We also use gradient boosting decision trees and SVM and merge all classifiers into an ensemble method. Our analysis shows that random forest is good at retrieving minority classes and gradient boosting majority classes. The strengths of different classifiers wrt. precision and recall complement each other in the ensemble.
This study investigates high vowel laxing in the Louisiana French of the Lafourche Basin. Unlike Canadian French, in which the high vowels /i, y, u/ are traditionally described as undergoing laxing (to [I, Y, U]) in word-final syllables closed by any consonant other than a voiced fricative (see Poliquin 2006), Oukada (1977) states that in the Louisiana French of Lafourche Parish, any coda consonant will trigger high vowel laxing of /i/; he excludes both /y/ and /u/ from his discussion of high vowel laxing. The current study analyzes tokens of /i, y, u/ from pre-recorded interviews with three older male speakers from Terrebonne Parish. We measured the first and second formants and duration for high vowel tokens produced in four phonetic environments, crossing syllable type (open vs. closed) by consonant type (voiced fricative vs. any consonant other than a voiced fricative). Results of the acoustic analysis show optional laxing for /i/ and /y/ and corroborate the finding that high vowels undergo laxing in word-final closed syllables, regardless of consonant type. Data for /u/ show that the results vary widely by speaker, with the dominant pattern (shown by two out of three speakers) that of lowering and backing in the vowel space of closed syllable tokens. Duration data prove inconclusive, likely due to the effects of stress. The formant data published here constitute the first acoustic description of high vowels for any variety of Louisiana French and lay the groundwork for future study on these endangered varieties.
Many applications in Natural Language Processing require a semantic analysis of sentences in terms of truth-conditional representations, often with specific desiderata in terms of which information needs to be included in the semantic analysis. However, there are only very few tools that allow such an analysis. We investigate the representations of an automatic analysis pipeline of the C&C parser and Boxer to determine whether Boxer’s analyses in form of Discourse Representation Structure can be successfully converted into a more surface oriented event semantic representation, which will serve as input for a fusion algorithm for fusing hard and soft information. We use a data set of synthetic counter intelligence messages for our investigation. We provide a basic pipeline for conversion and subsequently discuss areas in which ambiguities and differences between the semantic representations present challenges in the conversion process.
Brown clustering has been used to help increase parsing performance for morphologically rich languages. However, much of the work has focused on using clustering techniques to replace terminal nodes or as a feature for parsing. Instead, we choose to examine how effectively Brown clustering is for unlexicalized parsing by creating data-driven POS tagsets which are then used with the Berkeley parser. We investigate cluster sizes as well as on what information (e.g. words vs. lemmas) clustering will yield the best parser performance. Our results approach the current state of the art results for the German T¨uBa-D/Z treebank when using parser internal tagging.
Weihnachten erzählen
(2016)
Dieser Beitrag stellt nach einer kurzen allgemeinen Einführung die Datenbank für Gesprochenes Deutsch (DGD) und das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente speziell für gesprächsanalytisches Arbeiten vor. Anhand des Beispiels sprich als Diskursmarker für Reformulierungen werden Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen illustriert: Nutzungsmöglichkeiten der Token-, Kontext-, Metadaten- und Positionssuche werden gezeigt, jeweils in Bezug auf und im wechselseitigen Verhältnis mit qualitativen Fallanalysen, auch mit Belegannotationen nach analyserelevanten (strukturellen und funktionalen) Kategorien. Schließlich wird das heißt als weiterer Reformulierungsindikator für eine vergleichende Analyse herangezogen. Dieser Beitrag stellt eine detailliertere Ausarbeitung einer kürzeren, eher technisch-didaktischen Online-Handreichung (Kaiser/ Schmidt 2016) zu diesem Thema dar, und hat einen stärker inhaltlich-analytischen Fokus.
In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German
(2016)
Research has shown that language learners are not only challenged by segmental differences between their native language (L1) and the second language (L2). They also have problems with the correct production of suprasegmental structures, like phone/syllable duration and the realization of pitch. These difficulties often lead to a perceptible foreign accent. This study investigates the influence of prosody transplantation on foreign accent ratings. Syllable duration and pitch contour were transferred from utterances of a male and female German native speaker to utterances of ten French native speakers speaking German. Acoustic measurements show that French learners spoke with a significantly lower speaking rate. As expected, results of a perception experiment judging the accentedness of 1) German native utterances, 2) unmanipulated and 3) manipulated utterances of French learners of German suggest that the transplantation of the prosodic features syllable duration and pitch leads to a decrease in accentedness rating. These findings confirm results found in similar studies investigating prosody transplantation with different L1 and L2 and provide a beneficial technique for (computer-assisted) pronunciation training.
The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the IFCASL corpus incorporate data for a language pair in both directions, i.e. in our case French learners of German, and German learners of French. In addition, the corpus is complemented by two sub-corpora of native speech by the same speakers. The corpus provides spoken data by about 100 speakers with comparable productions, annotated and segmented on the word and the phone level, with more than 50% manually corrected data. The paper reports on inter-annotator agreement and the optimization of the acoustic models for forced speech-text alignment in exercises for computer-assisted pronunciation training. Example studies based on the corpus data with a phonetic focus include topics such as the realization of /h/ and glottal stop, final devoicing of obstruents, vowel quantity and quality, pitch range, and tempo.
The aim of this study is to select and formulate criteria for the assessment of tools and exercises that are using computer-assisted pronunciation training (CAPT). We examined ten different CAPT tools selected on the basis of an informal questionnaire among 10 colleagues working in a German-French CAPT project. Although the applied assessment must still be regarded as informal, and although the selected CAPT tools might not be an optimal sample for representing the state of the art, the results clearly show that there is a lot to improve regarding the clarity of instruction, the quality of exercises, the robustness of the diagnosis, the clarity and appropriateness of scoring, the diversity of feedback methods, the assumed benefit for various types of users as well as the usage of ASR. Despite various good approaches regarding graphics and game-like exercises there are obviously missing links between the pedagogical expertise in phonetic training on the one hand, and software development including usability engineering on the other.
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-native Speech
(2016)
Phonatory behavior of German speakers (GS) and French speakers (FS) in native (L1) and non-native (L2) speech was instrumentally examined. Vowel productions of the two groups were analyzed using a parametrization of phonatory behaviour and phonatory quality properties in the acoustic signal. The behavior of GS is characterized by more strained adduction of the vocal folds whereas FS show more incomplete glottal closure. Furthermore, GS change their phonatory behavior in the foreign language (=French) by adapting phonatory strategies of FS, whereas FS do not show this tendency. In addition, German beginners (BEG) and partly German advanced learners (ADV) are already orientated on production characteristics of the L2. French BEG however retain their phonatory behavior in L2 (=German) by showing less vocal fold adduction in comparison to their L1. French ADV show the opposite behavior. Finally, ADV of the two speaker groups generally show more strained behavior in L2 productions than BEG. The results provide evidence that GS and FS apply different laryngeal phonatory settings and that they altered their settings in L2 differently. Perceptual evaluation of voice quality of the speech material and a correlation analysis between acoustic and perceptual results are suggested for future research.
The paper presents best practices and results from projects in four countries dedicated to the creation of corpora of computer-mediated communication and social media interactions (CMC). Even though there are still many open issues related to building and annotating corpora of that type, there already exists a range of accessible solutions which have been tested in projects and which may serve as a starting point for a more precise discussion of how future standards for CMC corpora may (and should) be shaped like.