Refine
Year of publication
- 2018 (90) (remove)
Document Type
- Part of a Book (39)
- Article (27)
- Conference Proceeding (15)
- Book (4)
- Review (3)
- Periodical (1)
- Part of Periodical (1)
Keywords
- Deutsch (27)
- Korpus <Linguistik> (19)
- Grammatik (9)
- Multimodalität (9)
- Computerlinguistik (8)
- Gesprochene Sprache (8)
- Interaktion (8)
- Interaktionsanalyse (8)
- Konversationsanalyse (8)
- Annotation (6)
Publicationstate
- Veröffentlichungsversion (90) (remove)
Reviewstate
- Peer-Review (90) (remove)
Publisher
- European language resources association (ELRA) (13)
- Verlag für Gesprächsforschung (7)
- Znanstvena založba Filozofske fakultete Univerze v Ljubljani / Ljubljana University Press, Faculty of Arts (7)
- de Gruyter (6)
- Heidelberg University Publishing (5)
- Association for Computational Linguistics (4)
- Institut für Deutsche Sprache (4)
- Leibniz-Zentrum allgemeine Sprachwissenschaft (ZAS); Humboldt-Universität zu Berlin (3)
- The Association for Computational Linguistics (3)
- Austrian Academy of Sciences (2)
Am Beispiel der polyfunktionalen Mehrworteinheit <was weiß ich> wird das Zusammenspiel von pragmatischer und phonetischer Ausdifferenzierung in Pragmatikalisierungsprozessen untersucht. Hierzu werden spontan-sprachliche Belege aus dem Korpus „Deutsch heute“ analysiert. Die beobachtete phonetische Variationsbreite deutet auf eine komplexe Beziehung zu den jeweiligen pragmatischen Funktionen hin.
We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.
A syntax-based scheme for the annotation and segmentation of German spoken language interactions
(2018)
Unlike corpora of written language where segmentation can mainly be derived from orthographic punctuation marks, the basis for segmenting spoken language corpora is not predetermined by the primary data, but rather has to be established by the corpus compilers. This impedes consistent querying and visualization of such data. Several ways of segmenting have been proposed,
some of which are based on syntax. In this study, we developed and evaluated annotation and segmentation guidelines in reference to the topological field model for German. We can show that these guidelines are used consistently across annotators. We also investigated the influence of various interactional settings with a rather simple measure, the word-count per segment and unit-type. We observed that the word count and the distribution of each unit type differ in varying interactional settings and that our developed segmentation and annotation guidelines are used consistently across annotators. In conclusion, our syntax-based segmentations reflect interactional properties that are intrinsic to the social interactions that participants are involved in. This can be used for further analysis of social interaction and opens the possibility for automatic segmentation of transcripts.
This paper aims to describe different patterns of syntactic extensions of turns-at-talk in mundane conversations in Czech. Within interactional linguistics, same-speaker continuations of possibly complete syntactic structures have been described for typologically diverse languages, but have not yet been investigated for Slavic languages. Based on previously established descriptions of various types of extensions (Vorreiter 2003; Couper-Kuhlen & Ono 2007), our initial description shall therefore contribute to the cross-linguistic exploration of this phenomenon. While all previously described forms for continuing a turn-constructional unit seem to exist in Czech, some grammatical features of this language (especially free word order and strong case morphology) may lead to problems in distinguishing specific types of syntactic extensions. Consequently, this type of language allows for critically evaluating the cross-linguistic validity of the different categories and underlines the necessity of analysing syntactic phenomena within their specific action contexts.
The grammatical information system grammis combines descriptive texts on German grammar with dictionaries of specific word classes and grammatical terminology. In this paper, we describe the first attempts at analyzing user behavior for an online grammar of the German language and the implementation of an analysis and data extraction tool based on Matomo, a web analytics tool. We focus on the analysis of the keywords the users search for, either within grammis or via an external search platform like Google, and the analysis of the interaction between the text components within grammis and the integrated dictionaries. The overall results show that about 50% of the searches are for grammatical terms, and that the users shift from texts to dictionaries, mainly by using the integrated links to the dictionary of terminology within the texts. Based on these findings, we aim to improve grammis by extending its integrated dictionaries.
Das hier zu besprechende Buch, das Ergebnisse einer gleichnamigen Tagung zusammenfasst, die im Juni 2013 in Zürich stattfand, macht eines offenkundig: Wer in jenem Sommer nicht dabei war, hat etwas verpasst. Umso glücklicher darf man sein, dass Angelika Linke und Juliane Schröter die Arbeit, die mit der Herausgabe eines Sammelbandes verbunden ist, auf sich genommen haben. Mehr noch: In einem programmatischen ersten Kapitel geben sie einen systematischen Einblick in das tragfähige Forschungsfeld „Sprachliche Relationalität“ (vgl. S. 1–6), das ganz im Sinne der emotiven Wende in der Sprachwissenschaft konkrete theoretische Anschlussfähigkeit signalisiert, wo bislang eine „fast unübersehbare Menge an Veröffentlichungen“ (Schwarz-Friesel 2013: 16) zwar zeigte, wie attraktiv die Thematik ist, aber auch wie unstrukturiert sich die Zuwendung dazu gestaltet. Dass der Band nun weitere „exemplarische Besetzungen“ (S. 21) des Forschungsfeldes zur Diskussion stellt, wird hier keinesfalls als Nachteil angesehen, sondern als methodisch folgerichtiger empirischer Zugang zur Erschließung eines Forschungsfeldes unter den zielsetzenden Leitfragen „Wie werden im Medium von Sprachgebrauch und Sprache Konzeptualisierungen, Kategorisierungen und Differenzierungen menschlicher Beziehungen ausgebildet, verfestigt und auch wieder verändert?“ und „Welche sprachgeformten Beziehungskonzepte, -kategorien und -unterschiede sind typisch für bestimmte historische Epochen bzw. für bestimmte soziale Gruppierungen?“
The workshop presents ATHEN 1 (Annotation and Text Highlighting Environment), an extensible desktop-based annotation environment which supports more than just regular annotation. Besides being a general purpose annotation environment, ATHEN supports indexing and querying support of your data as well as the ability to automatically preprocess your data with Meta information. It is especially suited for those who want to extend existing general purpose annotation tools by implementing their own custom features, which cannot be fulfilled by other available annotation environments. On the according gitlab, we provide online tutorials, which demonstrate the use of specific features of ATHEN
Bisherige linguistische Studien zum mündlichen Erzählen beziehen sich vornehmlich auf die Beschreibung verbaler und vokaler Verfahren. Erzählen findet jedoch häufig unter den Bedingungen der zeitlich-räumlichen Ko-Präsenz der SprecherInnen statt, die den Gebrauch von körperlichen und materiellen Ressourcen ermöglicht. Der vorliegende einleitende Beitrag des Themenheftes modelliert Erzählen daher als körpergebundene und verkörperlichte Praktik, die es im Rahmen von interaktionalen und sequenzorientierten Analyseansätzen zu beschreiben gilt. Im Anschluss an die Darstellung von Entwicklungslinien der soziolinguistischen und interaktional-gesprächsanalytischen Untersuchung konversationellen Erzählens wird ein Überblick über bisherige Befunde zur multimodalen Ausgestaltung des Erzählens in der face-to-face-Interaktion gegeben. Abschließend werden grundlegende Fragestellungen skizziert, deren Beantwortung im Rahmen einer multimodalen Erzählanalyse die tatsächliche Alltagspraxis des Erzählens umfassender zu erschließen vermag.
This paper presents the results of a survey on dictionary use in Europe, the largest survey of dictionary use to date with nearly 10,000 participants in nearly thirty countries. The paper focuses on the comparison of the results of the Slovenian participants with the results of the participants from other European countries. The comparisons are made both with the European averages, and with the results from individual countries, in order to determine in which aspects Slovenian participants share similarities with other dictionary users (and non-users) around Europe, and in which aspects they differ. The findings show that in many ways the Slovenian users are similar to their European counterparts, with some noticeable exceptions, including (much) stronger preference for digital dictionaries over print ones, above-average reliance on other people when dictionary does not contain the relevant information, and the largest difference between the price of a dictionary and the amount willing to spend on it.
In this paper we use methods for creating a large lexicon of verbal polarity shifters and apply them to German. Polarity shifters are content words that can move the polarity of a phrase towards its opposite, such as the verb “abandon” in “abandon all hope”. This is similar to how negation words like “not” can influence polarity. Both shifters and negation are required for high precision sentiment analysis. Lists of negation words are available for many languages, but the only language for which a sizable lexicon of verbal polarity shifters exists is English. This lexicon was created by bootstrapping a sample of annotated verbs with a supervised classifier that uses a set of data- and resource-driven features. We reproduce and adapt this approach to create a German lexicon of verbal polarity shifters. Thereby, we confirm that the approach works for multiple languages. We further improve classification by leveraging cross-lingual information from the English shifter lexicon. Using this improved approach, we bootstrap a large number of German verbal polarity shifters, reducing the annotation effort drastically. The resulting German lexicon of verbal polarity shifters is made publicly available.
Die „21. Arbeitstagung zur Gesprächsforschung“ mit dem Rahmenthema „Vergleichende Gesprächsforschung“ fand vom 21.-23. März 2018 am Institut für Deutsche Sprache in Mannheim statt. Das Ziel der Tagung war es, Forscherinnen und Forscher zusammenzubringen, die authentische Interaktionsdaten aus vergleichender Perspektive untersuchen. Das Rahmenthema der Tagung ergab sich aus dem steigenden Interesse an vergleichenden Fragestellungen innerhalb konversations- und gesprächsanalytischer Untersuchungen. Die Tagung nahm gezielt Vorgehensweisen und Methoden bei der Durchführung vergleichender Untersuchungen in den Blick. Die Vorträge1, Projektpräsentationen und Datensitzungen erörterten 1. das Vergleichen als analytische Grundoperation der Konversations- und Gesprächsanalyse, 2. Vergleiche alternativer Ressourcen und Praktiken für spezifische Handlungen und Aktivitäten in der Interaktion sowie 3. methodologische Herausforderungen einer vergleichenden Gesprächsforschung.
Der Beitrag beschäftigt sich mit der Interaktion zwischen blinden und sehenden Personen bei der kooperativen Anfertigung einer Audiodeskription. Eine Audio-deskription ist die verbale Beschreibung visueller Inhalte für Sehbeeinträchtigte und stellt eine Sonderform der Translation dar. Auf der Basis von Videodaten wird die Kooperation eines Dreierteams mit den Verfahren der multimodalen Interaktionsanalyse untersucht. Ein Charakteristikum dieser Kooperation besteht darin, dass eines der Teammitglieder blind ist und die beiden anderen sehen können. Das Erkenntnisinteresse richtet sich besonders auf die professionelle Beteiligung des blinden Teammitglieds an der Interaktion. Die Analyse zeigt, wie Blindheit als Ressource für die kooperative Herstellung der Audiodeskription genutzt wird und wie die Beteiligten in einer visuell asymmetrischen Situation interagieren. Der Beitrag ist eine der seltenen Untersuchungen, die sich mit professioneller Interaktion zwischen Blinden und Sehenden beschäftigen. Er diskutiert Aspekte von genereller Relevanz für die weitere Entwicklung der empirischen Interaktionsforschung, vor allem in Bezug auf eine Erweiterung von Beteiligungsperspektiven in Richtung Inklusion.
The present submission reports on a pilot project conducted at the Institute for the German Language (IDS), aiming at strengthening the connection between ISO TC37SC4 “Language Resource Management” and the CLARIN infrastructure. In terminology management, attempts have recently been made to use graph-theoretical analyses to get a better understanding of the structure of terminology resources. The project described here aims at applying some of these methods to potentially incomplete concept fields produced over years by numerous researchers serving as experts and editors of ISO standards. The main results of the project are twofold. On the one hand, they comprise concept networks dynamically generated from a relational database and browsable by the user. On the other, the project has yielded significant qualitative feedback that will be offered to ISO. We provide the institutional context of this endeavour, its theoretical background, and an overview of data preparation and tools used. Finally, we discuss the results and illustrate some of them.
German is a language with complex morphological processes. Its long and often ambiguous word forms present a bottleneck problem in natural language processing. As a step towards morphological analyses of high quality, this paper introduces a morphological treebank for German. It is derived from the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished, modernized and partially revised version. The derivation of the morphological trees is not trivial, especially for such cases of conversions which are morpho-semantically opaque and merely of diachronic interest. We develop solutions and present exemplary analyses. The resulting database comprises about 40,000 morphological trees of a German base vocabulary whose format and grade of detail can be chosen according to the requirements of the applications. The Perl scripts for the generation of the treebank are publicly available on github. In our discussion, we show some future directions for morphological treebanks. In particular, we aim at the combination with other reliable lexical resources such as GermaNet.
Data Management is one of the core activities of all CLARIN centres providing data and services for the academia. In PARTHENOS, European initiatives and projects in the area of the humanities and social sciences assembled to compare policies and procedures. One of the areas of interest is data management. The data management landscape shows a lot of proliferation, for which an abstraction level is introduced to help centres, such as CLARIN centres, in the process of providing the best possible services to users with data management needs.
Many studies on dictionary use presuppose that users do indeed consult lexicographic resources. However, little is known about what users actually do when they try to solve language problems on their own. We present an observation study where learners of German were allowed to browse the web freely while correcting erroneous German sentences. In this paper, we are focusing on the multi-methodological approach of the study, especially the interplay between quantitative and qualitative approaches. In one example study, we will show how the analysis of verbal protocols, the correction task and the screen recordings can reveal the effects of intuition, language (learning) awareness, and determination on the accuracy of the corrections. In another example study, we will show how preconceived hypotheses about the problem at hand might hinder participants from arriving at the correct solution.
This paper discusses changes in lexicographic traditions with respect to contrastive dictionary entries and dynamic, on-demand e-lexicographic descriptions. The new German online dictionary Paronyme - Dyna- misch im Kontrast is concerned with easily confused words (paronyms), such as effektivtefficient and sensibel/ sensitiv. New approaches to the empirical analysis and lexicographic presentation of words such as these are required, and this dictionary is committed to overcoming the discrepancy between traditional practice and insights from language use. As a corpus-guided reference work, it strives to adequately reflect not only authentic use in situations of actual communication, but also cognitive ideas such as conceptual structure, categorization and knowledge. Looking up easily confused lexical items requires contrastive entries where users can instantly compare meaning, contexts and reference. Adaptable access to lexicographic details and variable search options offer different foci and perspectives on linguistic information, and authentic examples reflect prototypical structures. These are essential in order to meet all the different interests of users. This paper will illustrate the contrastive structure of the new e-dictionary and demonstrate which information can be compared. It also focusses on various dynamic modes of dictionary consultation, which enable users to shift perspectives on paronyms accordingly.
In the past two decades, more and more dictionary usage studies have been published, but most of them deal with questions related to what users appreciate about dictionaries, which dictionaries they use and what type of information they need in specific situations — presupposing that users actually consult lexicographic resources. However, language teachers and lecturers in linguistics often have the impression that students do not use enough high-quality dictionaries in their everyday work. With this in mind, we launched an international cooperation project to collect empirical data to evaluate what it is that students actually do while attempting to solve language problems. To this end, we applied a new methodological setting: screen recording in conjunction with a thinking-aloud task. The collected empirical data offers a broad insight into what users really do while they attempt to solve language-related tasks online.
Except for some recent advances in spoken language lexicography (cf. Verdonik & Sepesy Maučec 2017, Hansen & Hansen 2012, Siepmann 2015), traditional lexicographic work is mainly oriented towards the written language. In this paper, we describe a method we used to identify relevant headword candidates for a lexicographic resource for spoken language that is currently being developed at the Institute for the German Language (IDS, Mannheim). We describe the challenges of the headword selection for a dictionary of spoken language, and having made considerations regarding our headword concept, we present the corpus-based procedures that we used in order to facilitate the headword selection. After presenting the results regarding the selection of one-word lemmas, we discuss the opportunities and limitations of our approach.
Der Beitrag untersucht auf der Grundlage der multimodal-raumanalytischen Interaktionsanalyse die Abendmahlfeier in drei lutherisch-protestantischen Gottesdiensten. Die Videoaufnahmen hierzu stammen aus Sarepta (Russland) und Rimbach und Zotzenbach (Deutschland). Nach einer kurzen Einordnung des Beitrags in den relevanten Forschungszusammenhang wird das spezifische raumanalytische Erkenntnisinteresse am Abendmahl als kollektive Positionierungsanforderung erläutert. Drei Fallanalysen rekonstruieren zunächst die interaktionsarchitektonischen Voraussetzungen für die kollektive Bewegung der Gemeinde ins kirchenräumliche Vorne. Diese Bewegung, die Positionierung der Gemeinde zur Einnahme des Abendmahls (der Konsum von Wein und Brot) und der Rückweg zu den Kirchenbänken sind raumbezogene Teilaufgaben, die in der konkreten Situation bearbeitet werden müssen. Die Bewegung der Gemeinde wird in den drei analysierten Gottesdiensten auf sehr unterschiedliche Weise organisiert. Die Rekonstruktion dieser Unterschiede ermöglicht die Formulierung von drei unterschiedlichen Vollzugsmodellen primär auf der Basis der zwei folgenden Aspekte: Relevant ist zum einen das Ausmaß und die Form der Vergemeinschaftung
(als symbolischer Nachvollzugs des überlieferten Abendmahls von Jesus Christus mit seinen Jüngern am Gründonnerstag) und zum anderen die Spezifik, in der die Teilnehmer konkret den Wein und das Brot konsumieren. Auf diesem Wege konnten ein Modell der Vergemeinschaftung mit Kollektivversorgung (Sarepta), ein Modell der Teil-Vergemeinschaftung mit Teil-Gruppenversorgung (Zotzenbach) sowie ein Individualisierungsmodell mit Einzelversorgung (Rimbach) identifiziert werden. Als strukturprägende Einflussgrößen werden einerseits die Möglichkeiten, die die Architektur für den Vollzug des Abendmahls zur Verfügung stellt, und andererseits die Anzahl der Teilnehmer deutlich. Ab einer gewissen Anzahl entsteht eine Art Ökonomisierungszwang, der sich negativ auf die Qualität der Vergemeinschaftung auswirkt. Von Reinhold Schmitt stammt die Idee, das Abendmahl als Koordinations- und Positionierungsaufgabe zu konzeptualisieren. Er hat auch die multimodal-interaktionsanalytische Methodologie entwickelt, die dem Beitrag zugrunde liegt. Darüber hinaus hat er die Videoaufnahmen in Rimbach und Zotzenbach erstellt und transkribiert. Anna Petrova hat die Gottesdienste in Sarepta dokumentiert und transkribiert. Die methodische und theoretische Konzeption des Beitrags stammt von beiden Autoren. Auch die Analysen der ausgewählten Fälle haben sie gemeinsam durchgeführt.
Der Artikel widmet sich den politischen Fernsehinterviews im Ukrainischen und Deutschen aus der Perspektive der Persönlichkeit des Interviewers und der Schwierigkeiten, die vor und während des Fernsehinterviews auftreten. Kommunikative Abweichungen (Deviationen) werden als Unterschiede in den Erwartungen des Interviewers im Vergleich zu den Erwartungen des Befragten und des Adressaten aufgezeigt und analysiert. Besonderes Augenmerk wird auf das Beziehungsdreieck, bestehend aus Interviewer, Befragter und Adressat, gelegt. Bei der Beziehung zwischen diesen drei Größen spielen die Elemente Alter, Geschlecht, Status, Wissen, Interessen und Erwartungen eine wichtige Rolle und tragen zum Erfolg des Interviews bei. Dementsprechend übernimmt der Journalist drei Rollen: als Vertreter des Publikums, als Promotor des Eingeladenen (des Befragten) oder als Vertreter von sich selbst. Durch kommunikative Deviationen werden die Unterschiede in den Erwartungen der Kommunikatoren in einem Interview verstanden. In diesem Artikel wird nur auf die Abweichungen in den Fernsehinterviews in beiden Sprachen eingegangen, wenn der Interviewer andere Erwartungen an das Interview hat als der Befragte oder der Adressat (der Zuschauer), was für das erste ein Misserfolg ist, d.h. für den Interviewer. Es werden kommunikative Abweichungen des Interviewers gegenüber dem Befragten und dem Adressaten skizziert und die Strategien zur Überwindung von Misserfolgen eines Fernsehinterviews vorgeschlagen. Kommunikative Abweichungen als Verstöße gegen die Erwartungen des Interviewers in all seinen Erscheinungsformen können vermieden oder zumindest reduziert werden, wenn alle Elemente der Kommunikation auf informativer und emotionaler und sehr oft auf kommunikativ-situativer Ebene samt technischen Besonderheiten berücksichtigt werden.
We present a language learning application that relies on grammars to model the learning outcome. Based on this concept we can provide a powerful framework for language learning exercises with an intuitive user interface and a high reliability. Currently the application aims to augment existing language classes and support students by improving the learner attitude and the general learning outcome. Extensions beyond that scope are promising and likely to be added in the future.
Our corpus study is concerned with subject-verb agreement in contemporary German, more precisely the variation in verb number. We focus on subjects consisting of noun phrases coordinated by the conjunction und (‘and’). In our samples, both nouns are in singular. Number resolution – i.e., plural verb despite of the singular nouns – can be regarded as the default choice in contemporary German. However, our data show that eliding the second determiner in the subject enhances the probability of using the singular verb. This ellipsis effect is highly significant in German and Austrian texts. It seems to be weaker in Swiss texts. Regression analyses reveal that the ellipsis effect is stronger than both the highly significant influence of subject individuation and the significant effect of subject agentivity.
We present evidence for the analysis of the vowels in English <say> and <so> as biphonemic diphthongs /ɛi/ and /əu/, based on neutralization patterns, regular alternations, and foot structure. /ɛi/ and /əu/ are hence structurally on a par with the so called “true diphthongs” /ɑi/, /ɐu/, /ɔi/, but also share prosodic organization with the monophthongs /i/ and /u/. The phonological evidence is supported by dynamic measurements based on the American English TIMIT database.
Calculations of F2-slopes proved to be especially suited to distinguish the relevant groups in accordance with their phonologically motivated prosodic organizations.
Negation is an important contextual phenomenon that needs to be addressed in sentiment analysis. Next to common negation function words, such as not or none, there is also a considerably large class of negation content words, also referred to as shifters, such as the verbs diminish, reduce or reverse. However, many of these shifters are ambiguous. For instance, spoil as in spoil your chance reverses the polarity of the positive polar expression chance while in spoil your loved ones, no negation takes place. We present a supervised learning approach to disambiguating verbal shifters. Our approach takes into consideration various features, particularly generalization features.
We study German affixoids, a type of morpheme in between affixes and free stems. Several properties have been associated with them – increased productivity; a bleached semantics, which is often evaluative and/or intensifying and thus of relevance to sentiment analysis; and the existence of a free morpheme counterpart – but not been validated empirically. In experiments on a new data set that we make available, we put these key assumptions from the morphological literature to the test and show that despite the fact that affixoids generate many low-frequency formations, we can classify these as affixoid or non-affixoid instances with a best F1-score of 74%.
Cette contribution propose une analyse qualitative et quantitative des reformulations sur des données interactionnelles. Pour la constitution du corpus d’étude, nous nous appuyons sur un outil de détection automatique des hétéro-répétitions, considérées comme indices de reformulation. Après avoir illustré les éléments qui ont présidé à la conception de l’outil, nous présentons le paramétrage de cette ressource, que nous avons testée sur quatre enregistrements de la base de données CLAPI. Cette étude souligne la pertinence de l’approche interactionnelle dans l’analyse des hétéro-répétitions, en en montrant les fonctionnalités multiples, notamment dans les pratiques de reformulation dans la conversation.
Erzählen multimodal
(2018)
The paper describes preliminary studies regarding the usage of Example-Based Querying for specialist corpora. We outline an infrastructure for its application within the linguistic domain. Example-Based Querying deals with retrieval situations where users would like to explore large collections of specialist texts semantically, but are unable to explicitly name the linguistic phenomenon they look for. As a way out, the proposed framework allows them to input prototypical everyday language examples or cases of doubt, which are automatically processed by CRF and linked to appropriate linguistic texts in the corpus.
Auf der Grundlage videodokumentierter Kirchenbesichtigungen, bei denen exothetisches Sprechen als Erhebungsmethode eingesetzt wurde, analysiert der Aufsatz Gemeinsamkeiten und Unterschiede in den Kirchenbesichtigungen von Aurelia, Saskia und Anton. Alle haben dieselbe Kirche besichtigt und ihre visuelle Wahrnehmung des Kirchenraums – das war die explizit formulierte Aufgabe – durch verbale Kommentare und Beschreibungen begleitet. Übergeordnetes Ziel der Analyse des exothetischen Sprechens war die Rekonstruktion der den Besichtigungen zugrundeliegenden Konzepte, die zum Großteil in mitgebrachten Relevanzen begründet sind. Nach der Skizzierung unseres zentralen Erkenntnisinteresses und der Verortung unseres Ansatzes im relevanten Forschungskontext arbeiten wir zunächst die Gemeinsamkeiten der exothetischen Formen und ihre Funktionen in den drei Kirchenbesichtigungen heraus. Dann konzentrieren wir uns auf die Unterschiede und jeweiligen Besonderheiten der drei Besichtigungen und arbeiten dabei drei eigenständige, in sich schlüssige Besichtigungskonzepte heraus. Diese drei Konzepte zeichnen sich durch die jeweils eigenständige Konstitution des Kirchenraums bei dessen Besichtigung aus. Wir konnten zeigen, dass der Kirchenraum als religiöser Funktionsraum konstituiert wird (Aurelia), als Ort von Christusdarstellungen (Saskia) und als architekturgeschichtlicher Zusammenhang (Anton). Die modellhafte Eigenständigkeit der Konzepte wurde ausschließlich durch das exothetische Sprechen deutlich. Dies weist die wahrnehmungsbegleitende Thematisierung als wichtiges Erhebungs- und Analyseverfahren für den Zugang zur situierten Kognition im Zusammenhang mit dem Vollzug komplexer kultureller Praktiken aus.
In this paper, we present our approach to automatically extracting German terminology in the domain of grammar using texts from the online information system grammis as our corpus. We analyze existing repositories of German grammatical terminology and develop Part-of-speech patterns for our extraction thereby showing the importance of unigrams in this domain. We contrast the results of the automatic extraction with a manually extracted standard. By comparing the performance of well-known statistical measures, we show how measures based on corpus comparison outperform alternative methods.
We present the conceptual foundations and basic features of fLexiCoGraph, a generic software package for creating and presenting curated human-oriented lexicographical resources that are roughly modeled according to Měchura’s (2016) idea of graph-augmented trees. The system is currently under development and will be made accessible as open source software. As a sample use case we discuss an existing online database of loanwords borrowed from German into other languages which is based on a growing number of language-specific loanword dictionaries (Lehnwortportal Deutsch). The paper outlines the conceptual foundations of fLexiCoGraph’s hybrid graph/XML data model. To establish a database, XML-based resources may be imported or even input manually. An additional graph database layer is then constructed from these XML source documents in a freely configurable, but automated way; subsequently, the resulting graph can be manipulated and enlarged through a visual user interface in such a way that keeps the relationship to the source document information explicit at all times. We sketch the tooling support for different kinds of graph-level editing processes, including mechanisms for dealing with updated XML source documents and coping with duplicate or inconsistent information, and briefly discuss the browser interface for end users.
Die Bedeutung von Forschungsdatenmanagement im wissenschaftspolitischen Diskurs und im wissenschaftlichen Arbeitsalltag nimmt stetig zu. Nationale und internationale Forschungsinfrastrukturen, Verbünde, disziplinäre Datenzentren und institutionelle Kompetenzzentren nähern sich den Herausforderungen aus unterschiedlichen Perspektiven. Dieser Beitrag stellt das Data Center for the Humanities an der Universität zu Köln als Beispiel für ein universitäres Datenzentrum mit fachlicher Spezialisierung auf die Geisteswissenschaften vor.
Complement phrases are essential for constructing well-formed sentences in German. Identifying verb complements and categorizing complement classes is challenging even for linguists who are specialized in the field of verb valency. Against this background, we introduce an ML-based algorithm which is able to identify and classify complement phrases of any German verb in any written sentence context. We use a large training set consisting of example sentences from a valency dictionary, enriched with POS tagging, and the ML-based technique of Conditional Random Fields (CRF) to generate the classification models.
Grammar and corpora 2016
(2018)
In recent years, the availability of large annotated and searchable corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel and interesting work using corpus-based methods to study the grammar of natural languages. However, a look at relevant current research on the grammar of the Germanic, Romance, and Slavic languages reveals a variety of different theoretical approaches and empirical foci, which can be traced back to different philological and linguistic traditions. Still, this current state of affairs should not be seen as an obstacle but as an ideal basis for a fruitful exchange of ideas between different research paradigms.
In recent years, the availability of large annotated and searchable corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel and interesting work using corpus-based methods to study the grammar of natural languages. However, a look at relevant current research on the grammar of the Germanic, Romance, and Slavic languages reveals a variety of different theoretical approaches and empirical foci, which can be traced back to different philological and linguistic traditions. Still, this current state of affairs should not be seen as an obstacle but as an ideal basis for a fruitful exchange of ideas between different research paradigms.
In diesem Panel geht es um die Förderung der geisteswissenschaftlichen Forschung durch eine planvolle Erhebung, Archivierung, Veröffentlichung und die dadurch ermöglichte Nachnutzung von Forschungsdaten, die sowohl zur Qualitätssicherung in der Forschung beitragen als auch nicht zuletzt neue Fragestellungen erlauben. Aus unterschiedlichen Perspektiven soll in dem Panel beleuchtet werden, welchen Mehrwert das Datenmanagement für die Forschung in den digitalen Geisteswissenschaften hat, wie man diesen Mehrwert erreicht und auch die Veröffentlichung der Forschungsdaten als ein selbstverständliches Element der Dissemination der Forschungsergebnisse etabliert und wie man gleichzeitig den Aufwand für die Forschung abschätzen kann.
The actual or anticipated impact of research projects can be documented in scientific publications and project reports. While project reports are available at varying level of accessibility, they might be rarely used or shared outside of academia. Moreover, a connection between outcomes of actual research project and potential secondary use might not be explicated in a project report. This paper outlines two methods for classifying and extracting the impact of publicly funded research projects. The first method is concerned with identifying impact categories and assigning these categories to research projects and their reports by extension by using subject matter experts; not considering the content of research reports. This process resulted in a classification schema that we describe in this paper. With the second method which is still work in progress, impact categories are extracted from the actual text data.
We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the crossdomain detection of abusive microposts.
This paper argues that conversation analysis has largely neglected the fact that meaning in interaction relies on inferences to a high degree. Participants treat each other as cognitive agents, who imply and infer meanings, which are often consequential for interactional progression. Based on the study of audio- and video-recordings from German talk-in-interaction, the paper argues that inferences matter to social interaction in at least three ways. They can be explicitly formulated; they can be (conventionally) indexed, but not formulated; or they may be neither indexed nor formulated yet would be needed for the correct understanding of a turn. The last variety of inferences usually remain tacit, but are needed for smooth interactional progression. Inferences in this case become an observable discursive phenomenon if misunderstandings are treated by the explication of correct (accepted) and wrong (unaccepted) inferences. The understanding of referential terms, analepsis, and ellipsis regularly rely on inferences. Formulations, third-position repairs, and fourth-position explications of erroneous inferences are practices of explicating inferences. There are conventional linguistic means like discourse markers, connectives, and response particles that index specific kinds of inferences. These practices belong to a larger class of inferential practices, which play an important role for indexing and accomplishing intersubjectivity in talk in interaction.
The sentiment polarity of a phrase does not only depend on the polarities of its words, but also on how these are affected by their context. Negation words (e.g. not, no, never) can change the polarity of a phrase. Similarly, verbs and other content words can also act as polarity shifters (e.g. fail, deny, alleviate). While individually more sparse, they are far more numerous. Among verbs alone, there are more than 1200 shifters. However, sentiment analysis systems barely consider polarity shifters other than negation words. A major reason for this is the scarcity of lexicons and corpora that provide information on them. We introduce a lexicon of verbal polarity shifters that covers the entirety of verbs found in WordNet. We provide a fine-grained annotation of individual word senses, as well as information for each verbal shifter on the syntactic scopes that it can affect.
The European digital research infrastructure CLARIN (Common Language Resources and Technology Infrastructure) is building a Knowledge Sharing Infrastructure (KSI) to ensure that existing knowledge and expertise is easily available both for the CLARIN community and for the humanities research communities for which CLARIN is being developed. Within the Knowledge Sharing Infrastructure, so called Knowledge Centres comprise one or more physical institutions with particular expertise in certain areas and are committed to providing their expertise in the form of reliable knowledge-sharing services. In this paper, we present the ninth K Centre – the CLARIN Knowledge Centre for Linguistic Diversity and Language Documentation (CKLD) – and the expertise and services provided by the member institutions at the Universities of London (ELAR/SWLI), Cologne (DCH/IfDH/IfL) and Hamburg (HZSK/INEL). The centre offers information on current best practices, available resources and tools, and gives advice on technological and methodological matters for researchers working within relevant fields.
This paper offers an exploratory Interactional Linguistic account of the role that inferences play in episodes of ordinary conversational interaction. To this end, it systematically reconsiders the conversational practice of using the lexico-syntactic format oh that’s right to implicitly claim “just-now” recollection of something previously known, but momentarily confused or forgotten. The analyses reveal that this practice typically occurs as part of a larger sequential pattern that the participants orient to and which serves as a procedure for dealing with, and generating an account for, one participant’s production of an inapposite action. As will be shown, the instantiation and progressive realization of this sequential procedure requires local inferential work from the participants. While some facets of this inferential work appear to be shaped by the particular context of the ongoing interaction, others are integral to the workings of the sequence as such. Moreover, the analyses suggest that participants’ understanding of oh that’s right as embodying an implicit memory claim rests on an inference which is based on a kind of semanticpragmatic compositionality. The paper thus illustrates how inferences in conversational interaction can be systematically studied and points to the merits of combining an interactional and a linguistic perspective.
Das Journal für Medienlinguistik (jfml) ist eine medienlinguistische Open-Access-Zeitschrift. Im Sinne einer offenen, interaktiven und unabhängigen Wissenschaftskultur erfolgt die Qualitätssicherung des jfml durch ein Open Peer Review und die medienlinguistische Expertise des Editorial Boards. Das jfml veröffentlicht deutsch- und englischsprachige Artikel, Rezensionen und Tagungsberichte, die fortlaufend erscheinen.
Pädiatrische Gespräche unterscheiden sich gegenüber anderen ärztlichen Gesprächen mit Patienten hinsichtlich der Gesprächsaufgaben und der Beteiligungskonstellationen. In einer triadischen Konstellation mit Arzt, Patient und Eltern(teil) müssen unterschiedliche Kenntnisse und Zuständigkeiten aller Beteiligten ausreichend abgeglichen und Verständigung und Gesprächsergebnisse gesichert werden. In diesem Beitrag wird zunächst die Forschungslage umrissen und das Handlungsschema pädiatrischer Erstkonsultationen kurz dargelegt. Daran anschließend werden anhand einer Fallanalyse die vielschichtigen und komplexen Aufgabenstellungen der Beteiligten bei der Herstellung und Durchführung der körperlichen Untersuchung beleuchtet.
Language shift after migration has been reported to occur within three generations. While this pattern holds in many cases there is also some counter evidence. In this paper, family documents from a German immigration community in Canada are investigated to trace individual decisions of language choice that contributed to an extended process of shift taking four generations and more than a century.
The transfer of research data management from one institution to another infrastructural partner is all but trivial, but can be required,for instance, when an institution faces reorganisation or closure. In a case study, we describe the migration of all research data, identify the challenges we encountered, and discuss how we addressed them. It shows that the moving of research data management to another institution is a feasible, but potentially costly enterprise. Being able to demonstrate the feasibility of research data migration supports the stance of data archives that users can expect high levels of trust and reliability when it comes to data safety and sustainability.
The relation between speed and curvature provides a characterization of the spatio-temporal orchestration of kinematic movements. For hand movements, this relation has been reported to follow a power law with exponent -1/3. The same power law has been claimed to govern articulatory movements. We studied the functional form of speed as predicted by curvature using electromagnetic articulography, focusing on three sensors: the tongue tip, the tongue body, and the lower lip. Of specific interest to us was the question of whether the speed-curvature relation is modified by articulatory practice, gauged with words’ frequencies of occurrence. Although analyses imposing linearity a priori indeed supported a power law, relaxation of this linearity assumption revealed that the effect of curvature on speed levels off substantially for lower values of curvature. A modification of the power law is proposed that takes this curvature into account. Furthermore, controlling statistically for number of phones and word duration, we observed that the speed-curvature function was further modulated by an interaction of lexical frequency by curvature, such that for increasing frequency, speed decreased slightly for low curvatures while it increased slightly for high curvatures. The modulation of the balance between speed and curvature by lexical frequency provides further evidence that the skill of articulation improves with practice on a word-to-word basis, and challenges theories of speech production.
In mid-2017, as part of our activities within the TEI Special Interest Group for Linguists (LingSIG), we submitted to the TEI Technical Council a proposal for a new attribute class that would gather attributes facilitating simple token-level linguistic annotation. With this proposal, we addressed community feedback complaining about the lack of a specific tagset for lightweight linguistic annotation within the TEI. Apart from @lemma and @lemmaRef, up till now TEI encoders could only resort to using the generic attribute @ana for inline linguistic annotation, or to the quite complex system of feature structures for robust linguistic annotation, the latter requiring relatively complex processing even for the most basic types of linguistic features. As a result, there now exists a small set of basic descriptive devices which have been made available at the cost of only very small changes to the TEI tagset. The merit of a predefined TEI tagset for lightweight linguistic annotation is the homogeneity of tagging and thus better interoperability of simple linguistic resources encoded in the TEI. The present paper introduces the new attributes, makes a case for one more addition, and presents the advantages of the new system over the legacy TEI solutions.
Mit dem hier besprochenen Band liegt eine Monographie zu Pennsylvania Dutch(Pennsylvania German, Pennsylvania-Deutsch; im Weiteren auch PD) vor, die sowohl die Entstehungsbedingungen und -verläufe und den soziohistorischen, soziopolitischen und religionsbezogenen Kontext seiner Entwicklung als auch seine sprachlichen und literarischen Formen, seine historische und heutige gesellschaftliche Stellung und Verwendung umfassend und gründlich darstellt. Louden wendet sich dabei nicht nur an ein linguistisches Fachpublikum, sondern auch an LeserInnen ohne eine speziell linguistische Vorbildung. Dementsprechend werden für die Darstellung relevante linguistische Konzepte eingeführt und erklärt. Ein umfassendes Stichwortverzeichnis macht die Monographie gut erschließbar, und die umfangreiche Bibliographie ermöglicht es, sich weitergehend zu allen angesprochenen Themen zu informieren. Die Endnoten werden strategisch gut eingesetzt, da sie nicht nur fachwissenschaftliche ‚Unterfütterung‘ bieten, sondern auch dazu genutzt werden, alle zitierten Quellentexte sowohl auf Englisch als auch in der (pennsylvania-)deutschen Originalfassung zur Verfügung zu stellen.
All linguistics should be media linguistics, but it is not. This thesis is presented by using linguistic landscapes as an example. LL research does not belong to the traditional core of either mainstream linguis-tics or media linguistics. This is why not everything within power has been done yet to make full use of their thematic, conceptual and methodological possibilities. Visible signs in public space, however, are an everyday phenomenon. You have to pull out all the stops to research them extensively. The distinction between linguistics and media linguistics turns out to be counterproductive. But this does not only apply to the case of linguistic landscapes. It also stands for any comprehensive investigation of language and language use. (Ex-ceptions may be very narrow questions for specific purposes.) The above thoughts are supported by a database of the project „Metro-polenzeichen“ with more than 25.000 systematically collected, ge-ocoded and tagged photographs.
MULLE is a tool for language learning that focuses on teaching Latin as a foreign language. It is aimed for easy integration into the traditional classroom setting and syllabus, which makes it distinct from other language learning tools that provide standalone learning experience. It uses grammar-based lessons and embraces methods of gamification to improve the learner motivation. The main type of exercise provided by our application is to practice translation, but it is also possible to shift the focus to vocabulary or morphology training.
We present an approach for modeling German negation in open-domain fine grained sentiment analysis. Unlike most previous work in sentiment analysis, we assume that negation can be conveyed by many lexical units (and not only common negation words) and that different negation words have different scopes. Our approach is examined on a new dataset comprising sentences with mentions of polar expressions and various negation words. We identify different types of negation words that have the same scopes. We show that already negation modeling based on these types largely outperforms traditional negation models which assume the same scope for all negation words and which employ a window-based scope detection rather than a scope detection based on syntactic information.
In this paper, we discuss an efficient method of (semi-automatic) neologism detection for German and its application for the production of a dictionary of neologisms, focusing on the lexicographic process. By monitoring the language via editorial (print and online) media evaluation and interpreting the findings on the basis of lexicographic competence, many, but not all neologisms can be identified which qualify for inclusion in the Neologismenworterbuch (2006-today) at the Institute for the German Language in Mannheim (IDS). In addition, an automated corpus linguistic method offers neologism candidates based on a systematic analysis of large amounts of text to lexicographers. We explain the principles of the corpus linguistic compilation of a list of candidates and show how lexicographers work with the results, combining them with their own findings in order to continuously enlarge this specialized online dictionary of new words in German.
The aim of this paper is to present the results of an empirical analysis of the use of non-alphabetic graphic signs (e.g. asterisks, slashes, plus signs etc.) in the context of repairs in Russian and German informal electronic communication. The data for the analysis were taken from the “Mobile Communication Database MoCoDa” (http://mocoda.spracheinteraktion.de/), which contains Russian and German private electronic communication via SMS, WhatsApp and other short message services, and the “Dortmunder Chat-Korpus” (http://www.chatkorpus.tu-dortmund.de/korpora.html). This paper describes the functions of various graphic resources in the context of repairs in both data collections and compares the occurrences of these functions in current Russian and German computer-mediated communication. It concludes that particular signs in both data sets share the same subset of functions, but they differ in terms of how frequently these resources occur in each form of communication.
Overtaking as an interactional achievement : video analyses of participants' practices in traffic
(2018)
In this article we pursue a systematic and extensive study of overtaking in traffic as an interactional event. Our focus is on the accountable organisation and accomplishment of overtaking by road users in real-world traffic situations. Data and analysis are drawn from multiple research groups studying driving from an ethnomethodological and conversation analytic perspective. Building on multimodal and sequential analyses of video recordings of overtaking events, the article describes the shared practices which overtakers and overtaken parties use in displaying, recognizing and coordinating their manoeuvres. It examines the three sequential phases of an overtaking event: preparation and projection; the overtaking proper; the re-alignment post-phase including retrospective accounts and assessments. We identify how during each of these phases drivers and passengers organize intra-vehicle and inter-vehicle practices: driving and non-driving related talk between vehicle- occupants, the emerging spatiotemporal ecology of the road, and the driving actions of other road users. The data is derived from a two camera set-up recording the road ahead and car interior. The recordings are from three settings: daily commuting, driving lessons, race-car coaching. The events occur on a variety of road types (motorways, country roads, city streets, a race track, etc.), in six languages (English, Finnish, French, German, Italian, and Swedish) and in seven countries (Australia, Finland, France, Germany, Sweden, Switzerland, and the UK). From an exceptionally diverse collection of video data, the study of which is made possible thanks to the innovative collaboration of multiple researchers, the article exhibits the range of practical challenges and communicative skills involved in overtaking.
We present the pilot edition of the GermEval Shared Task on the Identification of Offensive Language. This shared task deals with the classification of German tweets from Twitter. It comprises two tasks, a coarse-grained binary classification task and a fine-grained multi-class classification task. The shared task had 20 participants submitting 51 runs for the coarse-grained task and 25 runs for the fine-grained task. Since this is a pilot task, we describe the process of extracting the raw-data for the data collection and the annotation schema. We evaluate the results of the systems submitted to the shared task. The shared task homepage can be found at https://projects.cai. fbi.h-da.de/iggsa/
In this article, the execution of a ritual as a component of religious communication is analysed. The ritual, in which the church community remembers the deceased, is celebrated in the evangelic church of Sarepta (Volgograd) on the last Sunday of the church year, the so-called ‘eternity Sunday’. The study of the ritual is based on two scientific approaches: ethnomethodology and multimodal interaction analysis. These approaches make it possible to analyse the social and cultural practices of church visitors in conjunction with the organisation of church service. Specifically, it becomes possible to:
a) develop new scientific paradigms when analysing the actual use of the church interior,
b) identify basic religious activities of communication in church,
c) introduce new concepts into scientific use,
d) present the ritual of remembrance in Sarepta as a complex, multimodally constituted religious event,
e) focus the coordination of linguistic, physical and spatial activities of church visitors and clerics at different stages of church service and to understand their respective social content and communicative status.
For analysing the video recordings of the church service, the concepts of ‘architecture-for-interaction’ and ‘social topography’ are used, making it possible to discover new aspects of spatial influence on communication. The concept of ‘architecture-for-interaction’ provides the framework for answering the question of how the church interior in Sarepta contributes to the organisation of the ritual. Forms of situational use of space and the cultural knowledge underlying this use are captured with the concept of ‘social topography’. From a structural viewpoint, the analyzed ritual in Sarepta is based on organization and division of responsibilities, consists of phases of structural non-simultaneity, has a three-positional spatial basis, and is structurally open. Because of these characteristics, the execution of the ritual can be described as ‘participatory rituality’. Participatory rituality allows for a religious socialization which lets the community members participate as active and legitimate participants in religious communication and autonomously contribute to the execution of the ritual.
Notions such as “corpus-driven” versus “theory-driven” bring into focus the specific role of corpora in linguistic research. As for phonology with its intrinsic focus on abstract categorical representation, there is a question of how a strictly corpus-driven approach can yield insight into relevant structures. Here we argue for a more theory-driven approach to phonology based on the concept of a phonological grammar in terms of interacting constraints. Empirical validation of such grammars comes from the potential convergence of the evidence from various sources including typological data, neutralization patterns, and in particular patterns observed in the creative use of language such as acronym formation, loanword adaptation, poetry, and speech errors. Further empirical validation concerns specific predictions regarding phonetic differences among opposition members, paradigm uniformity effects, and phonetic implementation in given segmental and prosodic contexts. Corpora in the narrowest sense (i.e. “raw” data consisting of spontaneous speech produced in natural settings) are useful for testing these predictions, but even here, special purpose-built corpora are often necessary.
In this paper we discuss a type of copular clause – specificational copular clauses – in which subject properties may be split between two nominative noun phrases. In particular, while the first noun phrase occupies the canonical preverbal subject position, in some languages the finite verb can agree with the postverbal nominative. Such agreement might be expected, on some theoretical assumptions, to show person restrictions. We discuss this phenomenon in two SVO Germanic languages – Icelandic and Faroese – and present new data from Faroese showing that the person effect here follows from the existence of distinct probes for Number and Person agreement.
Offensive language in social media is a problem currently widely discussed. Researchers in language technology have started to work on solutions to support the classification of offensive posts. We present the pilot edition of the GermEval Shared Task on the Identification of Offensive Language. This shared task deals with the classification of German tweets from Twitter. GermEval 2018 is the fourth workshop in a series of shared tasks on German processing.
Contents:
1. Christoph Kuras, Thomas Eckart, Uwe Quasthoff and Dirk Goldhahn: Automation, management and improvement of text corpus production, S. 1
2. Thomas Krause, Ulf Leser, Anke Lüdeling and Stephan Druskat: Designing a re-usable and embeddable corpus search library, S. 6
3. Radoslav Rábara, Pavel Rychlý and Ondřej Herman: Distributed corpus search, S. 10
4. Adrien Barbaresi and Antonio Ruiz Tinoco: Using elasticsearch for linguistic analysis of tweets in time and space, S. 14
5. Marc Kupietz, Nils Diewald and Peter Fankhauser: How to Get the Computation Near the Data: Improving data accessibility to, and reusability of analysis functions in corpus query platforms, S. 20
6. Roman Schneider: Example-based querying for specialist corpora, S. 26
7. Paul Rayson: Increasing interoperability for embedding corpus annotation pipelines in Wmatrix and other corpus retrieval tools, S. 33
How can we measure the impact – such as awareness for economic, ecological, and political matters – of information, such as scientific publications, user-generated content, and reports from the public administration, based on text data? This workshop brings together research from different theoretical paradigms and methodologies for the extraction of impact-relevant indicators from natural language text data and related meta-data. The papers in this workshop represent different types of expertise in different methods for analyzing text data; spanning the whole spectrum of qualitative, quantitative, and mixed methods techniques, as well as domain expertise in the field of impact measurement. The program was built to create an interdisciplinary half-day workshop where we discuss possibilities, limitations, and synergistic effects of different approaches.
Projektvorstellung – Redewiedergabe. Eine literatur- und sprachwissenschaftliche Korpusanalyse
(2018)
Das laufende DFG-Projekt „Redewiedergabe“ stellt einen Anwendungsfall quantitativer Sprach-und Literaturwissenschaft dar und beschäftigt sich mit dem Phänomen „Redewiedergabe“ auf der Grundlage großer Datenmengen. Zu diesem Zweck wird zum einen ein Korpus manuell mit Redewiedergabeformen annotiert, zum anderen werden Verfahren zur automatischen Erkennung des Phänomens entwickelt. Ziel ist es, Forschungsfragen nach der Entwicklung von Redewiedergabe vor allem im 19. Jahrhundert zu beantworten.
Controlled Natural Languages (CNLs) have many applications including document authoring, automatic reasoning on texts and reliable machine translation, but their application is not limited to these areas. We explore a new application area of CNLs, the use of CNLs in computer-assisted language learning. In this paper we present a a web application for language learning using CNLs as well as a detailed description of the properties of the family of CNLs it uses.
Статтю присвячено комунікативним девіаціям (невдачам) на матеріалі українських і німецьких телеінтерв’ю з П. Порошенком та А. Меркель. Встановлено, що спілкування осіб з різними комунікативними цілями і стратегіями – головні причини девіацій. Проаналізовано комунікативні невдачі, враховуючи позиції адресанта й адресата, а також глядача даних інтерв’ю, визначено спільні та відмінні стратегії у випадку комунікативних девіацій в українській і німецькій лінгвокультурах.
We present a method for detecting and reconstructing separated particle verbs in a corpus of spoken German by following an approach suggested for written language. Our study shows that the method can be applied successfully to spoken language, compares different ways of dealing with structures that are specific to spoken language corpora, analyses some remaining problems, and discusses ways of optimising precision or recall for the method. The outlook sketches some possibilities for further work in related areas.
This paper analyses reply relations in computer-mediated communication (CMC), which occur between post units in CMC interactions and which describe references between posts. We take a look at existing practices in the description and annotation of such relations in chat, wiki talk, and blog corpora. We distinguish technical reply structures, indentation structures, and interpretative reply relations, which include reply relations induced by linguistic markers. We sort out the different levels of description and annotation that are involved and propose a solution for their combined representation within the TEI annotation framework.
This paper analyses reply relations in computer-mediated communication (CMC), which occur between post units in CMC interactions and which describe references between posts. We take a look at existing practices in the description and annotation of such relations in chat, wiki talk, and blog corpora. We distinguish technical reply structures, indentation structures, and interpretative reply relations, which include reply relations induced by linguistic markers. We sort out the different levels of description and annotation that are involved and propose a solution for their combined representation within the TEI annotation framework.
We present a method for detecting annotation errors in manually and automatically annotated dependency parse trees, based on ensemble parsing in combination with Bayesian inference, guided by active learning. We evaluate our method in different scenarios: (i) for error detection in dependency treebanks and (ii) for improving parsing accuracy on in- and out-of-domain data.
This paper discusses current trends in DeReKo, the German Reference Corpus, concerning legal issues around the recent German copyright reform with positive implications for corpus building and corpus linguistics in general, recent corpus extensions in the genres of popular magazines, journals, historical texts, and web-based football reports. Besides, DeReKo is finally accessible via the new
corpus research platform KorAP, offering registered users several news features in comparison with its predecessor COSMAS II.
This study investigates the language used by six German Gangsta rappers to establish and maintain their identity and authenticity as rappers, in songs released between 2015 and 2016. Gangsta rap is a subgenre of Hip-Hop that emphasises ‘the rappers’ street credibility in texts describing tough [urban] neighbourhoods, violence, misogyny, and the achievement of material wealth’ (Bower 379). The culture of Gangsta rap attracts overwhelmingly negative mainstream media coverage (Muggs; Roper) and is often accused of corrupting ‘standard’ language (Krummheuer). The lyrical content of the songs is indeed controversial and has been previously covered by many academics (Byrd; Littlejohn and Putnam; Bower; Rollefson), as has the emergence of Hip-Hop in Germany (Elflein; Pennay; Nitzsche and Grünzweig).
This paper studies the morphological productivity of German N+N compounding patterns from a diachronic perspective. It argues that the productivity of compounds increases due to syntactic influence from genitive constructions (“improper compounds”) in Early New High German. Both quantitative and qualitative productivity measures are adapted from derivational morphology and tested on compound data from the Mainz Corpus of (Early) New High German (1500–1710).
Both for psychology and linguistics, emotion concepts are a continuing challenge for analysis in several respects. In this contribution, we take up the language of emotion as an object of study from several angles. First, we consider how frame semantic analyses of this domain by the FrameNet project have been developing over time, due to theory-internal as well as application-oriented goals, towards ever more fine-grained distinctions and greater within-frame consistency. Second, we compare how FrameNet’s linguistically oriented analysis of lexical items in the emotion domain compares to the analysis by domain experts of the experiences that give rise (directly or indirectly) to the lexical items. And finally, we consider to what extent frame semantic analysis can capture phenomena such as connotation and inference about attitudes, which are important in the field of sentiment analysis and opinion mining, even if they do not involve the direct evocation of emotion.
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.
Gratitude is argued to have evolved to motivate and maintain social reciprocity among people, and to be linked to a wide range of positive effects—social, psychological and even physical. But is socially reciprocal behaviour dependent on the expression of gratitude, for example by saying ‘thank you’ as in English? Current research has not included cross-cultural elements, and has tended to conflate gratitude as an emotion with gratitude as a linguistic practice, as might appear to be the case in English. Here, we ask to what extent people express gratitude in different societies by focusing on episodes of everyday life where someone seeks and obtains a good, service or support from another, comparing these episodes across eight languages from five continents. We find that expressions of gratitude in these episodes are remarkably rare, suggesting that social reciprocity in everyday life relies on tacit understandings of rights and duties surrounding mutual assistance and collaboration. At the same time, we also find minor cross-cultural variation, with slightly higher rates in Western European languages English and Italian, showing that universal tendencies of social reciprocity should not be equated with more culturally variable practices of expressing gratitude. Our study complements previous experimental and culture-specific research on gratitude with a systematic comparison of audiovisual corpora of naturally occurring social interaction from different cultures from around the world.
In Beispielen wie
(1) Du hast scheints / Weiß Gott nichts begriffen.
(2) It cost £200, give or take.
(3) Qu’est ce qu’il a dit?
werden verbale Konstruktionen (kurz: VK, hier jeweils die fett gesetzten Teile) in einer Weise gebraucht, die der Grammatik verbaler Konstruktionen zuwiderläuft. In (1) und (2) wird die verbale Konstruktion wie ein Adverb/eine Partikel gebraucht bzw. wie ein Ausdruck in der Funktion eines (adverbialen) Adjunkts/ Supplements. In (3) ist die verbale Konstruktion zum Bestandteil einer periphrastischen interrogativen Konstruktion geworden. Wie sind solche ‘Umfunktionalisierungen’ – wie ich das Phänomen zunächst vortheoretisch bezeichnen möchte – einzuordnen? Handelt es sich um Lexikalisierung oder um Grammatikalisierung? Oder um ein Phänomen der dritten Art? Die Umfunktionalisierung verbaler Syntagmen bzw. Konstruktionen – ich gebrauche die Abkürzung UVK für ‘umfunktionalisierte verbale Konstruktion(en)’ – ist ein bisher weniger gut untersuchtes Phänomen, etwa gegenüber der Umfunktionalisierung von Präpositionalphrasen, die sprachübergreifend zu komplexen, „sekundären“ Präpositionen werden können (man vergleiche DEU auf Grund + Genitiv / von, ENG on top of, FRA à cause de).
In diesem Beitrag werden drei quantitative Studien vorgestellt, mit deren Hilfe untersucht wird, ob neben dem robusten Längenunterschied auch Qualitätsunterschiede für die deutschen <a>-Laute vorhanden sind (z.B. <Saat> versus <satt>). Auf Basis von ausgewählten Korpora und instrumentalphonetischen Messungen kann dieser Zusammenhang bestätigt werden. Zudem zeigen sich signifikante Unterschiede in den dynamischen
Verläufen der beiden Vokale.
We present ESDexplorer (https://owid.shinyapps.io/ESDexplorer), a browser application which allows the user to explore the data from a large European survey on dictionary use and culture. We built ESDexplorer with several target groups in mind: our cooperation partners, other researchers, and a more general public interested in the results. Also, we present in detail the architecture and technological realisation of the application and discuss some legal aspects of data protection that motivated some architectural choices.
Der folgende Beitrag fokussiert die kommunikative Praktik „Fragen“ im Beratungsformat Führungskräfte-Coaching. Fragen stellen laut Praxis-Literatur und Ausbildungsmanualen zu Coaching ein, wenn nicht das, zentrale Interventionsinstrumentarium dar. Trotz dieser formulierten Omnipräsenz und Omnirelevanz gibt es bis dato kaum empirische Erkenntnisse über die tatsächliche Verwendung von Fragen im Coaching. Fragen sind weder in der quantitativ operierenden, psychologischen Wirksamkeits- bzw. Outcome-Forschung noch in der qualitativ operierenden, linguistischen Prozessforschung (zentraler) Forschungsgegenstand. Diese Forschungslücke gilt es im Austausch mit der Praxis und unter Einbezug aller relevanten Disziplinen und Methoden zu schließen. In einem ersten vorbereitenden Schritt macht es sich der vorliegende programmatische Beitrag zur Aufgabe, das Phänomen „Fragen im Coaching“ als Forschungsgegenstand der linguistischen
Gesprächsanalyse zu etablieren. Fragen im Coaching werden dabei sowohl bezüglich ihrer Form, ihrer Funktion als auch als institutionsspezifische soziale Praktik diskutiert, wobei Erkenntnisse zur Verwendung von Fragen in benachbarten professionellen Gesprächen wie Psychotherapie oder Arzt-Patient-Kommunikation als erste Orientierung herangezogen werden. Im Zentrum der gesprächsanalytischen Diskussion steht der Beitrag, den Frage-Sequenzen zur Veränderung und damit zur lokalen Wirksamkeit von Coaching leisten. Der Artikel endet mit einer kritischen Evaluation der Möglichkeiten einer gesprächsanalytischen Erforschung von Frage-Sequenzen und skizziert den Mehrwert von interprofessioneller und interdisziplinärer, insbesondere linguistischer und psychologischer, Forschung für die Coaching-Praxis.
What is a sentient agent?
(2018)
Digitale Medien haben zu einer folgenreichen Veränderung politischer Diskurse beigetragen: Bürgerinnen und Bürger haben nunmehr die Möglichkeit eines direkten und permanenten Dialogs mit politisch Agierenden. Diese wiederum haben soziale Netzwerke als „wirkungsvolle Kommunikationsform für sich entdeckt“ (Kneuer 2017, S.46). Damit haben sich auch die politischen Partizipationsmöglichkeiten verändert. Neben den konventionellen Partizipationsformen erfahren die Bürgerinnen und Bürger nach der Erweiterung in den 1960er Jahren durch nicht institutionalisierte Formen (Woyke 2013) heute eine weitere Form der politischen Teilhabe durch digitale Medien.
Zur Vorbereitung eines zweisprachigen Fachworterbuchs zur Tourismusfachsprache werden korpuslinguistische Verfahren eingesetzt, um Auffalligkeiten in der jeweiligen Fachsprache im Vergleich zum allgemeinsprachlichen Gebrauch aufzuspüren. Neben den hervorstechenden Elementen des Vokabulars, den Schlüsselwortern als potentiellen Stichwortern, geht es vor allem um sprach- und fachsprachspezifische typische Formulierungen und deren Ubersetzungsaquivalente. Fur die gemeinsame, interlinguale Betrachtung des Sprachenpaars Deutsch-Italienisch wurde ein kleines Fachsprachenkorpus aufgebaut und innerhalb der Sketch Engine-Umgebung unter Zuhilfenahme der darin integrierten Referenzkorpora ausgewertet. Fur eine weitere intralinguale Untersuchung der deutschsprachigen Komponente wurde auf das Deutsche Referenzkorpus DeReKo und weitere, intern zu Verfügung stehende Instrumente des Instituts für Deutsche Sprache zuruckgegriffen. Neben üblichen Verfahren der quantitativen Ein- oder Mehrwortbewertung wird ein Ansatz ergänzend getestet, der der dunnen Datengrundlage im fachsprachlichen Bereich Rechnung trägt: Diese ergibt sich nicht nur aus der Korpusgrobe, sondern auch daraus, dass bestimmte feste Floskeln (wie ,eine Reiserücktrittsversicherung abschlieben‘) selten rekurrent, vielmehr eher nur einmal pro Text verwendet werden. Auch wenn dieser Ansatz aufgrund infrastruktureller Artefakte in Einzelfallen an seine Grenzen stößt, die hier selbstkritisch nicht verschwiegen werden sollen, so zeigt sich doch an vielen Stellen auch das grobe Potential. Abschließend wird beispielhaft illustriert, wie Evidenzen dieser und der anderen korpuslinguistischen Auswertungen lexikographisch umgesetzt wurden.
Der vorliegende Beitrag befasst sich mit Erzählen in seiner massenmedialen Vermittlung in einer Unterhaltungsendung im Fernsehen. Ziel ist es, anhand einer multimodalen und medienlinguistischen Analyse eines exemplarischen Ausschnitts aus der TV-Unterhaltungssendung "Zimmer frei" die Spezifik solcher massenmedialen Erzählungen herauszuarbeiten. Zum einen wird aufgezeigt, dass sich massenmediales Erzählen in seinem sequenziellen Auf- und Ausbau aufgrund seiner Einbindung in ein mediales Unterhaltungsformat in systematischer Weise von Alltagserzählungen unterscheidet. Zum anderen wird veranschaulicht, inwieweit theatrale Inszenierungs- und Aufführungsmittel der Fernsehproduktion die Aktivität des Erzählens mitkonstituieren. Erzählungen im Fernsehen, so die analyseleitende Prämisse, können nicht schlicht als durch das Fernsehen übertragene narrative Aktivitäten konzeptualisiert werden. Vielmehr sind sie durch eine mediale Theatralität mitgeprägt. (Para)verbale, körperliche und mediale Inszenierungs- und Aufführungsverfahren greifen konzertiert ineinander, um Erzählungen als "dramas to an audience" (Goffman 1974:508) hervorzubringen.