Refine
Year of publication
Document Type
- Part of a Book (181)
- Article (166)
- Conference Proceeding (38)
- Review (5)
- Book (2)
- Other (2)
- Preprint (1)
Language
Keywords
- Deutsch (140)
- Korpus <Linguistik> (51)
- Konversationsanalyse (48)
- Interaktion (34)
- Semantik (23)
- Computerlinguistik (22)
- Kommunikation (21)
- Mehrsprachigkeit (21)
- Sprachpolitik (19)
- Englisch (18)
Publicationstate
- Postprint (395) (remove)
Reviewstate
- (Verlags)-Lektorat (176)
- Peer-Review (166)
- Peer-review (10)
- Verlags-Lektorat (5)
- Peer-Revied (2)
- (Verlags-)Lektorat (1)
- Review-Status-unbekannt (1)
- Zweitveröffentlichung (1)
Publisher
- Benjamins (52)
- Springer (36)
- Oxford University Press (19)
- Elsevier (14)
- Wilhelm Fink (14)
- Erich Schmidt (11)
- Buske (10)
- Winter (8)
- Equinox (6)
- Lang (6)
"Standard language" is a contested concept, ideologically, empirically and theoretically. This is particularly true for a language such as German, where the standardization of the spoken language was based on the written standard and was established with respect to a communicative situation, i.e. public speech on stage (Bühnenaussprache), which most speakers never come across. As a consequence, the norms of the oral standard exhibit many features which are infrequent in the everyday speech even of educated speakers. This paper discusses ways to arrive at a more realistic conception of (spoken) standard German, which will be termed "standard usage". It must be founded on empirical observations of speakers linguistic choices in everyday situations. Arguments in favor of a corpus-based notion of standard have to consider sociolinguistic, political, and didactic concerns. We report on the design of a large study of linguistic variation conducted at the Institute for the German Language (project "Variation in Spoken German", Variation des gesprochenen Deutsch) with the aim of arriving at a representative picture of "standard usage" in contemporary German. It systematically takes into account both diatopic variation covering the multi-national space in which German an official language, and diastratic variation in terms of varying degrees of formality. Results of the study of phonetic and morphosyntactic variation are discussed. At least for German, a corpus-based notion of "standard usage" inevitably includes some degree of pluralism concerning areal variation, and it needs to do justice to register-based variation as well.
In the management of cooperation, the fit of a requested action with what the addressee is presently doing is a pervasively relevant consideration. We present evidence that imperative turns are adapted to, and reflexively create, contexts in which the other person is committed to the course of action advanced by the imperative. This evidence comes from systematic variation in the design of imperative turns, relative to the fittedness of the imperatively mandated action to the addressee’s ongoing trajectory of actions, what we call the “dine of commitment”. We present four points on this dine: Responsive imperatives perform an operation on the deontic dimension of what the addressee has announced or already begun to do (in particular its permissibility); local-project imperatives formulate a new action advancing a course of action in which the addressee is already actively engaged; global-project-imperatives target a next task for which the addressee is available on the grounds of their participation in the overall event, and in the absence of any competing work; and competitive imperatives draw on a presently otherwise engaged addressee on the grounds of their social commitment to the relevant course of actions. These four turn shapes are increasingly complex, reflecting the interactional work required to bridge the increasing distance between what the addressee is currently doing, and what the imperative mandates. We present data from German and Polish informal and institutional settings.
Authors like Fillmore 1986 and Goldberg 2006 have made a strong case for regarding argument omission in English as a lexical and construction-based affordance rather than one based on general semantico-pragmatic constraints. They do not, however, address the question of how grammatical restrictions on null complementation might interact with broader narrative conventions, in particular those of genre. In this paper, we attempt to remedy this oversight by presenting a comprehensive overview of genre-based argument omissions and offering a construction-based analysis of genre-based omission conventions. We consider five genre-based omission types: instructional imperatives (Culy 1996, Bender 1999), labelese, diary style (Haegeman 1990), match reports (Ruppenhofer 2004) and quotative clauses. We show that these omission types share important traits; all, for example, have anaphoric rather than indefinite construals. We also show, however, that the omission types differ from each other in idiosyncratic ways. We then address several interrelated representational problems posed by the grammatical treatment of genre-based omissions. For example, the constructions that represent genre-based omission conventions must interact with the lexical entries of verbs, many of which do not generally permit omitted arguments. Accordingly, we offer constructional analyses of genre-based omissions that allow constructions to override lexical valence constraints.
Our paper deals with the use of ICH WEIß NICHT (‘I don’t know’) in German talk-in-interaction. Pursuing an Interactional Linguistics approach, we identify different interactional uses of ICH WEIß NICHT and discuss their relationship to variation in argument structure (SV (O), (O)VS, V-only). After ICH WEIß NICHT with full complementation, speakers emphasize their lack of knowledge or display reluctance to answer. In contrast, after variants without an object complement, in contrast, speakers display uncertainty about the truth of the following proposition or about its sufficiency as an answer. Thus, while uses with both subject and object tend to close a sequence or display lack of knowledge, responses without an object, in contrast, function as a prepositioned epistemic hedge or a pragmatic marker framing the following TCU. When ICH WEIß NICHT is used in response to a statement, it indexes disagreement (independently from all complementation patterns).
We question the growing consensus in the literature that European Americans behave as a homogenous pan-ethnic coalition of voters. Seemingly below the radar of scholarship on voting groups in American politics, we identify a group of white voters that behaves differently from others: German Americans, the largest ethnic group, regionally concentrated in the ‘Swinging Midwest’. Using county level voting returns, ancestry group information from the American Community Survey (ACS), current survey data and historical census data going back as early as 1910, we provide evidence for a partisan and a non-partisan pathway that motivated German Americans to vote for Trump in 2016: a historically grown association with the Republican Party and an acquired taste for isolationist attitudes that mobilizes non-partisan German Americans to support isolationist candidates. Our findings indicate that European American experiences of migration and integration still echo into the political arena of today.
In this paper we present an evaluation of rule-based morphological components for German for use in an interactive editing environment. The criteria for the evaluation are deduced from the intended use of these components, namely availability, performance, programming interfaces, and analysis quality. We evaluated systems developed and maintained since decades as well as new systems. However, we note serious general shortcomings when looking closer at recent implementations and come to the conclusion that the oldest system is the only one that satisfies our requirements.
In diesem Kapitel stellen wir zunächst grundlegende Konzepte von Abfragesystemen und Abfragesprachen für die Suche in Korpora vor. Diese Konzepte sollen Ihnen helfen, die einzelnen Abfragesprachen besser zu verstehen und vergleichen zu können. Die gängigen Abfragesprachen unterscheiden sich in vielen Details. Diese Details und die Möglichkeiten und Grenzen der einzelnen Abfragesprachen stellen wir im zweiten Teil mit vielen Beispielaufgaben und dazu passenden Lösungen in jeweils drei Abfragesprachen vor.
This paper is concerned with a novel methodology for generating phonetic questions used in tree-based state tying for speech recognition. In order to implement a speech recognition system, language-dependent knowledge which goes beyond annotated material is usually required. The approach presented here generates phonetic questions for decision trees are based on a feature table that summarizes the articulatory characteristics of each sound. On the one hand, this method allows better language-specific triphone models to be defined given only a feature-table as linguistic input. On the other hand, the feature-table approach facilitates efficient definition of triphone models for other languages since again only a feature table for this language is required. The approach is exemplified with speech recognition systems for English and Thai.
In this paper, we present an overview of freely available web applications providing online access to spoken language corpora. We explore and discuss various solutions with which the corpus providers and corpus platform developers address the needs of researchers who are working with spoken language. The paper aims to contribute to the long-overdue exchange and discussion of methods and best practices in the design of online access to spoken language corpora.
Eine korpuslinguistische Untersuchung mit umfassender Analyse der häufiger vorkommenenden Adverbbildungsmuster des Deutschen legt nahe, dass die Sättigung des internen Argumentplatzes eines ursprünglich relationalen Ausdrucks eine wichtige Rolle bei der Adverbproduktion spielt (Brandt 2020). Eine genauere Betrachtung der Unterschiede zwischen -ermaßen- vs. -erweise-Adverbien deutet auf eine grammatische Unterscheidung zwischen Satzadverbien und Adverbien der Art und Weise: Im Fall von -ermaßen erfolgt die Sättigung über Token-Reflexivität, während der interne Slot von -erweise- Bildungen über häufigere und möglicherweise expansive Mechanismen geschlossen wird. Darüber hinaus fördert die pleonastische Qualität von Bildungen auf der Basis gerundivaler Partizipien die Produktivität von -erweise Adverbien.
This paper investigates the use of linking adverbs in adversative constructions in German and Italian. In Italian those constructions are very frequently formulated with adverbs such as invece, while wordings without a lexical connective are more typical of German. Corpus data show that the syntactic und semantic conditions favouring the use of adversative adverbs are by and large the same in both languages. Lexical connectives can increase explicitness when the intended adversative interpretation is not obvious on other grounds. The higher frequency of adversative adverbs in Italian is shown to be a consequence of the more restrictive rules of the placement of prosodic accent.
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
There are strict formal requirements for the use of a comma. However, there are none regarding the comma’s actual shape. In printed fonts, it is determined by the font’s specification. In hand-written texts though, the shape of the comma is variable; most writers choose from a set of straight, convex and concave shapes. By using a corpus of 1464 commas written by 99 individuals, we will present three case studies of persons whose comma shapes do somehow correlate with linguistic structures. With that, we might identify a few (possibly subconscious) shaping strategies. Some writers might mark a norm insecurity by a different comma form, others might mark the function of the entity which is segmented by the comma, or the comma type itself (sentence boundary, exposition or coordination).
Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
The paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation and analysis of multimodality. Each of the tools has specific strengths so that a variety of different tools, working on the same data, can be desirable for project work. However this usually requires tedious conversion between formats. We propose a common exchange format for multimodal annotation, based on the annotation graph (AG) formalism, which is supported by import and export routines in the respective tools. In the current version of this format the common denominator information can be reliably exchanged between the tools, and additional information can be stored in a standardized way.
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
Just like most varieties of West Germanic, virtually all varieties of German use a construction in which a cognate of the English verb 'do' (standard German 'tun') functions as an auxiliary and selects another verb in the bare infinitive, a construction known as 'do'-periphrasis or 'do'-support. The present paper provides an Optimality Theoretic (OT) analysis of this phenomenon. It builds on a previous analysis by Bader and Schmid (An OT-analysis of 'do'-support in Modern German, 2006) but (i) extends it from root clauses to subordinate clauses and (ii) aims to capture all of the major distributional patterns found across (mostly non-standard) varieties of German. In so doing, the data are used as a testing ground for different models of German clause structure. At first sight, the occurrence of 'do' in subordinate clauses, as found in many varieties, appears to support the standard CP-IP-VP analysis of German. In actual fact, however, the full range of data turn out to challenge, rather than support, this model. Instead, I propose an analysis within the IP-less model by Haider (Deutsche Syntax - generativ. Vorstudien zur Theorie einer projektiven Grammatik, Narr, Tübingen, 1993 et seq.). In sum, the 'do'-support data will be shown to have implications not only for the analysis of clause structure but also for the OT constraints commonly assumed to govern the distribution of 'do', for the theory of non-projecting words (Toivonen in Non-projecting words, Kluwer, Dordrecht, 2003) as well as research on grammaticalization.
Although there is a growing interest of policy makers in higher education issues (especially on an international scale), there is still a lack of theoretically well-grounded comparative analyses of higher education policy. Even broadly discussed topics in higher education research like the potential convergence of European higher education systems in the course of the Bologna Process suffer from a thin empirical and comparative basis. This paper aims to deal with these problems by addressing theoretical questions concerning the domestic impact of the Bologna Process and the role national factors play in determining its effects on cross-national policy convergence. It develops a distinct theoretical approach for the systematic and comparative analysis of cross-national policy convergence. In doing so, it relies upon insights from related research areas — namely literature on Europeanization as well as studies dealing with cross-national policy convergence.
MRI data of German vowels and consonants was acquired for 9 speakers. In this paper tongue contours for the vowels were analyzed using the three-mode factor analysis technique PARAFAC. After some difficulties, probably related to what constitutes an adequate speaker sample for this three-mode technique to work, a stable two-factor solution was extracted that explained about 90% of the variance. Factor 1 roughly captured the dimension low back to high front; Factor 2 that from mid front to high back. These factors are compared with earlier models based on PARAFAC. These analyses were based on midsagittal contours; the paper concludes by illustrating from coronal and axial sections how non-midline information could be incorporated into this approach.
Distributional models of word use constitute an indispensable tool in corpus based lexicological research for discovering paradigmatic relations and syntagmatic patterns (Belica et al. 2010). Recently, word embeddings (Mikolov et al. 2013) have revived the field by allowing to construct and analyze distributional models on very large corpora. This is accomplished by reducing the very high dimensionality of word cooccurrence contexts, the size of the vocabulary, to few dimensions, such as 100-200. However, word use and meaning can vary widely along dimensions such as domain, register, and time, and word embeddings tend to represent only the most prevalent meaning. In this paper we thus construct domain specific word embeddings to allow for systematically analyzing variations in word use. Moreover, we also demonstrate how to reconstruct domain specific co-occurrence contexts from the dense word embeddings.
Korpora sind – als idealerweise digital verfüg- und auswertbare Sammlungen von Texten – eine wertvolle empirische Grundlage linguistischer Studien. Eigene Korpora aufzubauen ist, je nach Sprachausschnitt, mit unterschiedlichen Herausforderungen verbunden. Zu allen Texten sollten Metadaten zu den Textentstehungsbedingungen (Zeit, Quelle usw.) erhoben werden, um diese als Variablen in Auswertungen einbeziehen zu können. Andere Informationen wie etwa die Themenzugehörigkeit (oder Annotationen auch unterhalb der Textebene) sind auch hilfreich, in vielerlei Hinsicht aber schwieriger pauschal taxonomisch vorzugeben, geschweige denn, operationell zu ermitteln. Jenseits der »materiellen« Verfügbarkeit der Texte und der technischen Aufbereitung sind es das Urheberrecht, vor allem Lizenz- bzw. Nutzungsrechte, sowie ethische Verantwortung und Persönlichkeitsrechte, die beachtet werden müssen, auch um zu gewährleisten, dass die Daten für die Reproduktion der Studien Dritten rechtssicher zugänglich gemacht werden dürfen. Bevor für ein Vorhaben ein neues Korpus aufgebaut wird, sollte deshalb am besten geprüft werden, ob nicht ein geeignetes bereits zur Verfügung steht. Wenn ein Korpus aufgebaut wird, sollte für eine nachhaltige Aufbewahrung und Zugänglichmachung gesorgt und die Existenz an geeigneter Stelle dokumentiert werden.
In HDK-1 und in HDK-2 werden Perfektpartizipien wie angenommen und vorausgesetzt in der ‚absoluten‘ Verwendung ohne Auxiliar als vollständig grammatikalisierte Konnektoren mit konditionaler Semantik behandelt. Zwar werden sie von semantisch unterschiedlichen Verben gebildet, in der Verwendung als Konnektor lassen sich aber zumindest hinsichtlich der Wahrheitsbedingungen kaum semantische Unterschiede mehr erkennen. Deutliche Unterschiede zeigen sich aber im Sprachgebrauch: Basierend auf einer groß angelegten Korpusstudie wird gezeigt, dass sich angenommen und vorausgesetzt stark unterscheiden hinsichtlich a) ihrer Präferenz für die Einbettung von V2- vs. dass-Nebensätzen, b) des präferierten Verbmodus im Nebensatz, c) der topologischen Präferenz des untergeordneten Satzes sowie d) der Kookkurrenz mit anderen Ausdrücken. Es wird versucht, diese Unterschiede mit einem pragmatisch-funktionalen Ansatz zu erklären.
Dieser Artikel analysiert am Beispiel eines Racletteessens unter Freunden, wie innerhalb einer langen Sequenz das Warten auf den Beginn des Essens strukturiert wird. Während der fast 50 Minuten, die zwischen der Ankunft der ersten Gäste sowie dem Beginn des Essens vergehen, orientieren sich die Teilnehmer auf unterschiedliche Weise zum Warten als Aktivität. Das sukzessive Eintreffen der Gäste führt jeweils zu Eröffnungssequenzen innerhalb dieser Wartezeit. Anhand von Auszügen dieser Zeitspanne verfolgt die Analyse, wie sich die Teilnehmer zu dieser Zeitlichkeit des Wartens und (Noch-nicht-)Beginnens orientieren und wie sie den Anfang des Essens gemeinsam konstruieren.
We continue the study of the reproducibility of Propp’s annotations from Bod et al. (2012). We present four experiments in which test subjects were taught Propp’s annotation system; we conclude that Propp’s system needs a significant amount of training, but that with sufficient time investment, it can be reliably trained for simple tales.
Since the beginning of the Covid-19 pandemic, about 2000 new lexical units have entered the German lexicon. These concern a multitude of coinings and word formations (Kuschelkontakt, rumaerosolen, pandemüde) as well as lexical borrowings mainly from English (Lockdown, Hotspot, Superspreader). In a special way, these neologisms function as keywords and lexical indicators sketching the development of the multifaceted corona discourse in Germany. They can be detected systematically by corpus-linguistic investigations of reports and debates in contemporary public communication. Keyword analyses not only exhibit new vocabulary, they also reveal discursive foci, patterns of argumentation and topicalisations within the diverse narratives of the discourse. With the help of quickly established and dominant neologisms, this paper will outline typical contexts and thematic references, but it will also identify speakers' attitudes and evaluations.
Theateraufführungen sind ohne Zuschauer nicht denkbar. Zugleich erweisen sich Proben aber als öffentlichkeitsabgeschirmte und intime Vorgänge, da eine (zu frühe) Orientierung an möglichen Publikums-Effekten den kreativen Prozess stört. Auf der Grundlage von über 30 Stunden Videoaufnahmen von Theaterproben zeige ich an ausgewählten Ausschnitten, wie Theatermachende sich sprachlich und körperlich im Probenprozess auf das Publikum beziehen, wie dies interaktiv realisiert wird und welche Rückschlüsse das auf die Weisen der Publikumskonstruktion im Kontext von Proben zulässt.
Antonymy is a relation of lexical opposition which is generally considered to involve (i) the presence of a scale along which a particular property may be graded, and hence both (ii) gradability of the corresponding lexical items and (iii) typical entailment relations. Like other types of lexical opposites, antonyms typically differ only minimally: while denoting opposing poles on the relevant dimension of difference, they are similar with respect to other components of meaning. This paper presents examples of antonymy from the domain of speech act verbs which either lack some of these typical attributes or show problems in the application of these. It discusses several different proposals for the classification of these atypical examples.
Repeating the movements associated with activities such as drawing or sports typically leads to improvements in kinematic behavior: these movements become faster, smoother, and exhibit less variation. Likewise, practice has also been shown to lead to faster and smoother movement trajectories in speech articulation. However, little is known about its effect on articulatory variability. To address this, we investigate the extent to which repetition and predictability influence the articulation of the frequent German word “sie” [zi] (they). We find that articulatory variability is proportional to speaking rate and the duration of [zi], and that overall variability decreases as [zi] is repeated during the experiment. Lower variability is also observed as the conditional probability of [zi] increases, and the greatest reduction in variability occurs during the execution of the vocalic target of [i]. These results indicate that practice can produce observable differences in the articulation of even the most common gestures used in speech.
HMMs are the dominating technique used in speech recognition today since they perform well in overall phone recognition. In this paper, we show the comparison of HMM methods and machine learning techniques, such as neural networks, decision trees and ensemble classifiers with boosting and bagging in the task of articulatory-acoustic feature classification. The experimental results show that HMM methods work well for the classification of such features as vocalic. However, decision tree and bagging outperform HMMs for the fricative classification task since the data skewness is much higher than for the feature vocalic classification task. This demonstrates that HMMs do not perform as well as decision trees and bagging in highly skewed data settings.
Precise multimodal studies require precise synchronisation between audio and video signals. However, raw audio and audio from video recordings can be out of sync for several reasons. In order to re-synchronise them, a dynamic programming (DP) approach is presented here. Traditionally, DP is performed on the rectangular distance matrix comparing each value in signal A with each value in signal B. Previous work limited the search space using for example the Sakoe Chiba Band (Sakoe and Chiba, 1978). However, the overall space of the distance matrix remains identical. Here, a tunnel matrix and its according DP-algorithm are presented. The matrix contains merely the computed distance of two signals to a pre-specified bandwidth and the computational cost is equally reduced. An example implementation demonstrates the functionality on artificial data and on data from real audio and video recordings.
„Actual words are of theoretical interest” (Audring 2021: 3). Unter Zugrundelegung dieser gebrauchsbasierten Prämisse geht der vorliegende Beitrag der Frage nach, wie sich die Nominalkomposition im Deutschen auf der Basis sprachlicher Massendaten als Konstruktionsfamilie, d.h. als ein hierarchisches Netzwerk von Konstruktionen unterschiedlichen Abstraktionsgrads, beschreiben lässt. Der Beitrag knüpft in theoretischer Hinsicht an Booijs (2010) „Construction Morphology” an, geht jedoch insofern über diese hinaus, als versucht wird, deren Grundannahmen auch auf automatisch erhobene sprachliche Massendaten anzuwenden. Konkret wird mit einem Inventar von rund 185.000 Zusammensetzungen aus zwei simplizischen Nomen gearbeitet, die systematisch aus dem Deutschen Referenzkorpus (DeReKo) (vgl. Leibniz-Institut für Deutsche Sprache 2007) extrahiert und im Anschluss (semi)automatisch weiterverarbeitet wurden.
Automatic recognition of speech, thought, and writing representation in German narrative texts
(2013)
This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.
Feminine forms of job titles raise great interest in many countries. However, it is still unknown how they shape stereotypical impressions on warmth and competence dimensions among female and male listeners. In an experiment with fictitious job titles men perceived women described with feminine job titles as significantly less warm and marginally less competent than women with masculine job titles, which led to lower willingness to employ them. No such effects were observed among women.
Besser als gedacht
(2021)
Das grammatische Wissen von Lehramtsstudierenden ist besser als gedacht. Im Basisartikel (s. Döring/Elsner in diesem Band) wird darauf verwiesen, dass Studien zeigten, dass bei Studierenden zu Studienbeginn das grammatische Wissen nicht in dem gewünschten Maße vorhanden ist und dass auch die universitäre Lehre keinen Ausgleich dieser Defizite bewirken muss. Dennoch bleibt die Frage, ob das, was in den Studien gemessen wird, nicht eher dem terminologischen Wissen entspricht, was bei Studienbeginn nicht vorhanden sein muss, weil der Grammatikunterricht viel zu lang zurückliegt und im Studienverlauf genau diese Termini entweder keine Rolle spielen oder kritisch diskutiert werden, sodass die Fragen auch nicht mehr so einfach beantwortet werden können. Hinter diesen Studien steckt doch letztlich die Frage, welcher Wissensbestand und welcher Wissenszuwachs gemessen werden soll und ob die verwendeten Methoden das geeignete Mittel darstellen. Daher möchten wir in diesem Kommentar aufzeigen, in welcher Weise unserer Meinung nach Lehramtsstudierende solide grammatische Kenntnisse aufweisen (können), in welcher Hinsicht epistemische Überzeugungen von Lehrenden einen Einfluss haben können und welche Aspekte in der unversitären Lehre (im Bereich der Grammatik) zusätzlich berücksichtigt werden sollten, um einen nachhaltigeren Lernerfolg zu ermöglichen. Dies ist durchaus als optimistischer Beitrag zu verstehen, insofern als sich die universitäre Hochschullehre für Lehramtsstudierende im Bereich der Grammatik im positiven Sinne auf den Weg gemacht hat.
This paper provides a unified semantic and discourse pragmatic analysis of the German particle nämlich, traditionally described as having a specificational and an explanative reading. Our claim is that nämlich is a discourse marker which signals that the expression it is attached to is a short (elliptic) answer to a salient implicit question about the previous utterance. We show how both the explanative and the specificational reading can be derived from this more general semantic contribution. In addition we discuss some cross linguistic consequences of our analysis.
Mobiles Livevideostreaming ist eine Medienpraktik, bei der sich die Beteiligten in einer spezifischen Ausrichtung zueinander befinden und in der Streamer*innen und Zuschauer*innen unterschiedliche semiotische Ressourcen zur Verfügung stehen. Anhand der multimodalen Sequenzanalyse einer prägnanten Episode eines Ortswechsels im Rahmen der Berichterstattung eines Journalisten von einem politischen Ereignis auf der Livevideostreaming-Plattform Periscope wird die Frage bearbeitet, wie Beteiligung und involvement in Livevideostreams hergestellt sowie organisiert werden und dargelegt, inwiefern mobiles Livevideostreaming soziale Parainteraktion transzendiert. Es wird gezeigt, dass die Hosts der Medienpraktik ‚Livevideostreaming' interaktionsdominierend agieren und die Zuschauer*innen durch asymmetrische Partizipationskoordination per footing shifts situativ in das Geschehen involvieren.
Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations
(2009)
In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.
Blogg Dir deinen Urlaub nach Tunesien! Zur Erläuterung des Musters [VImp PROPReflexivDat NPAkk]
(2020)
In diesem Beitrag soll das Muster [VImp PROPReflexivDat NPAkk] semantisch und syntaktisch erläutert werden. Dieses Muster, das semantisch mit Verben des Erwerbens wie anschaffen korreliert, wird auch im Zusammenhang mit Kommunikationsverben wie bloggen und facebooken sowie mit dem Kontaktverb rubbeln belegt. Mithilfe des Konzeptes der Koerzion bzw. der semantischen Anpassung soll das Kovorkommen des erwänhten Musters mit diesen Verben beschrieben und erklärt werden. Als empirische Quelle dient das Korpus für das Deutsche 2012 und 2014 aus den Corpora from the Web. Die vorliegende Untersuchung ist im Rahmen meiner Dissertationsarbeit zum Thema Argumentstruktur und Bedeutung medialer Kommunikationsverben des Deutschen und des Spanischen im Sprachvergleich durchgeführt worden.
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rule-based classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge by training a supervised classifier with in-domain features, such as bag of words, on instances labeled by a rule-based classifier. Thus, this approach can be considered as a simple and effective method for domain adaptation. Among the list of components of this approach, we investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. In particular, the former addresses the issue in how far linguistic modeling is relevant for this task. We not only examine how this method performs under more difficult settings in which classes are not balanced and mixed reviews are included in the data set but also compare how this linguistically-driven method relates to state-of-the-art statistical domain adaptation.
Complex common names such as Indian elephant or green tea denote a certain type of entity, viz. kinds. Moreover, those kinds are always subkinds of the kind denoted by their head noun. Establishing such subkinds is essentially the task of classifying modifiers that are a defining trait of endocentrically structured complex common names. Examining complex common names of different lexico-syntactic types(NN compounds, N+N syntagmas, NP/PP syntagmas, A+N syntagmas) and from different languages (particularly English, German and French) it can be shown that complex common names are subject to language- independent formal and semantic constraints. In particular, complex common names qualify as name-like expressions in that they tend to be deficient in terms of formal complexity and semantic compositionality.
This paper deals with different types of verbal complementation of the German verb verdienen. It focuses on constructions that have been undergoing a grammaticalization process and thus express deontic modality, as in Sie verdient geliebt zu werden (ʽShe deserves to be lovedʼ) and Sie verdient zu leben (ʽShe deserves to liveʼ) (Diewald, Dekalo & Czicza 2021). These constructions are connected to parallel complementation types with passive and active infinitives containing a correlate es, as in Sie verdient es, geliebt zu werden and Sie verdient es, zu leben, as well as finite clauses with the subordinator dass with and without correlative es, as in Sie verdient, dass sie geliebt wird and Sie verdient es, dass sie geliebt wird. This paper attempts to show a close comparative investigation of these six types of constructions based on their relevant semantic and syntactic properties in terms of clause linkage (Lehmann 1988). We analyze the relevant data retrieved from the DWDS corpus of the 20th century and present an expanded grammaticalization path for verdienen-constructions. The finite complementation with dass is regarded as an example of a separate structural option called “elaboration”. Concerning the use of correlative es, it is shown that it does not have any substantial effect on the grammaticalization of modal verdienen-constructions.
Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.
The ubiquity of smartphones has been recognised within conversation analysis as having an impact on conversational structures and on the participants’ interactional involvement. However, most of the previous studies have relied exclusively on video recordings of overall encounters and have not systematically considered what is taking place on the device. Due to the personal nature of smartphones and their small displays, onscreen activities are of limited visibility and are thus potentially opaque for both the co-present participants (“participant opacity”) and the researchers (“analytical opacity”). While opacity can be an inherent feature of smartphones in general, analytical opacity might not be desirable for research purposes. This chapter discusses how a recording set-up consisting of static cameras, wearable cameras and dynamic screen captures allowed us to address the analytical opacity of mobile devices. Excerpts from multi-source video data of everyday encounters will illustrate how the combination of multiple perspectives can increase the visibility of interactional phenomena, reveal new analytical objects and improve analytical granularity. More specifically, these examples will emphasise the analytical advantages and challenges of a combined recording set-up with regard to smartphone use as multiactivity, the role of the affordances of the mobile device, and the prototypicality and “naturalness” of the recorded practices.
Communication of stereotypes in the classroom: biased language use of German and Turkish adolescents
(2014)
Little is known about the linguistic transmission and maintenance of mutual stereotypes in interethnic contexts. This field study, therefore, investigated the linguistic expectancy bias (LEB) and the linguistic intergroup bias (LIB) among German and Turkish adolescents (13 to 20 years) in the school context. The LEB refers to the general phenomenon of describing stereotypes more abstractly. The LIB is the tendency to use language abstraction for in-group protective reasons. Results revealed an unmoderated LEB, whereas the LIB only occurred when foreigners were in the numerical majority, the classroom composition was perceived as a learning disadvantage, or the interethnic conflict frequency was high. These findings provide first evidence for the use of both LEB and LIB in an interethnic classroom setting.
Most research on ethnicity has focused on visual cues. However, accents are strong social cues that can match or contradict visual cues. We examined understudied reactions to people whose one cue suggests one ethnicity, whereas the other cue contradicts it. In an experiment conducted in Germany, job candidates spoke with an accent either congruent or incongruent with their (German or Turkish) appearance. Based on ethnolinguistic identity theory, we predicted that accents would be strong cues for categorization and evaluation. Based on expectancy violations theory we expected that incongruent targets would be evaluated more extremely than congruent targets. Both predictions were confirmed: accents strongly influenced perceptions and Turkish-looking German-accented targets were perceived as most competent of all targets (and additionally most warm). The findings show that bringing together visual and auditory information yields a more complete picture of the processes underlying impression formation.
Several studies have examined effects of explicit task demands on eye movements in reading. However, there is relatively little prior research investigating the influence of implicit processing demands. In this study, processing demands were manipulated by means of a between-subject manipulation of comprehension question difficulty. Consistent with previous results from Wotschack and Kliegl, the question difficulty manipulation influenced the probability of regressing from late in sentences and re-reading earlier regions; readers who expected difficult comprehension questions were more likely to re-read. However, this manipulation had no reliable influence on eye movements during first-pass reading of earlier sentence regions. Moreover, for the subset of sentences that contained a plausibility manipulation, the disruption induced by implausibility was not modulated by the question manipulation. We interpret these results as suggesting that comprehension demands influence reading behavior primarily by modulating a criterion for comprehension that readers apply after completing first-pass processing.
In this paper the authors briefly outline editing functions which use methods from computational linguistics and take the structures of natural languages into consideration. Such functions could reduce errors and better support writers in realizing their communicative goals. However, linguistic methods have limits, and there are various aspects software developers have to take into account to avoid creating a solution looking for a problem: Language-aware functions could be powerful tools for writers, but writers must not be forced to adapt to their tools.
Content analysis provides a useful and multifaceted, methodological framework for Twitter analysis. CAQDAS tools support the structuring of textual data by enabling categorising and coding. Depending on the research objective, it may be appropriate to choose a mixed-methods approach that combines quantitative and qualitative elements of analysis and plays out their respective advantages to the greatest possible extent while minimising their shortcomings. In this chapter, we will discuss CAQDAS speech act analysis of tweets as an example of software-assisted content analysis. We start with some elementary thoughts on the challenges of the collection and evaluation of Twitter data before we give a brief description of the potentials and limitations of using the software QDA Miner (as one typical example for possible analysis programmes). Our focus will lie on analytical features that can be particularly helpful in speech act analysis of tweets.
This paper deals with different views of lexical semantics. The focus is on the relationship between lexical expressions and conceptual components. First the assumptions about lexicalization and decompositionality of concepts shared by the most semanticists are presented, followed by a discussion of the differences between two-level-semantics and one-level-semantics. The final part is concentrated on the interpretation of conceptual components in situations of communication.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of word formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DeReKo and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of wrd formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DEREKO and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage; its poetic genres impose unique combinatory constraints on linguistic elements. How does the constrained poetic structure facilitate speech segmentation when common linguistic and statistical cues are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language.
This introduction summarizes general issues combining lexicography and neology in the context of the Globalex Workshop on Lexicography and Neology series. We present each of the six papers composing this Special Issue, featuring two Slavic languages (Czech and Slovak) and two Romance ones (Brazilian Portuguese and Spanish in its European and Latin American varieties) and their diverse lexicographic research and representation, in specialized dictionaries of neologisms or general language ones, in monolingual, bilingual and multilingual lexical resources, and in print and digital dictionaries.
One major issue in the accomplishment of contrasts in conversation is lexical choice of items which carry the semantic Ioad of the two states of affair which are represented as being opposed to one another. These items or expressions are co-selected to be understood as being contrastively related to each other. In this paper, it is argued that the activity of contrasting itself provides them with a specific local opposite meaning which they would not obtain in other contexts. Practices of contrastingare thus seen as an example of conversational activities which creatively and systematically affect situated meanings. Basedon data from various genres, such as meetings, mediation sessions and conversations, the paper discusses two practices of contrasting, their sequential construction and their interpretative effects. It is concluded that the interpretative effects of conversational contrasting rest on the sequential deployment oflinguistic resources and on the cognitive procedures of frame-based interpretation and constructing a maximally contrastive interpretation for the co-selected expressions.
In the context of the HyTex project, our goal is to convert a corpus into a hypertext, basing conversion strategies on annotations which explicitly mark up the text-grammatical structures and relations between text segments. Domain-specific knowledge is represented in the form of a knowledge net, using topic maps. We use XML as an interchange format. In this paper, we focus on a declarative rule language designed to express conversion strategies in terms of text-grammatical structures and hypertext results. The strategies can be formulated in a concise formal syntax which is independend of the markup, and which can be transformed automatically into executable program code.
Our paper examines how bodily behavior contributes to the local meaning of OKAY. We explore the interplay between OKAY as response to informings and narratives and accompanying multimodal resources in German multi-party interaction. Based on informal and institutional conversations, we describe three different uses of OKAY with falling intonation and the recurrent multimodal patterns that are associated with them and that can be characterized as ‘multimodal gestalts’. We show that: 1. OKAY as a claim to sufficient understanding is typically accompanied by upward nodding; 2. OKAY after change-of-state tokens exhibits a recurrent pattern of up- and downward nodding with distinctive timing; and 3. OKAY closing larger activities is associated with gaze-aversion from the prior speaker.
Selten zuvor hat ein Ereignis in der Welt so direkt und für viele Menschen unmittelbar spürbar Einfluss auf den Wortschatz des Deutschen gehabt wie die Coronapandemie. Fast täglich konnte man ab Frühjahr 2020 neuen Wortschatz im Radio oder Fernsehen hören und in Zeitungen, Zeitschriften oder Beiträgen in den Sozialen Medien lesen. Zugleich sind zahlreiche medizinische und epidemiologische Fachausdrücke in den Allgemeinwortschatz eingegangen. Welche Spuren dieses dynamischen Wandels in Lexikon und Kommunikation auf lange Sicht in unserer Sprache zu finden sein werden, ist eine offene Frage, auf die die Sprachwissenschaft erst in den nächsten Jahrzehnten eine Antwort wird geben können. Erste Tendenzen aber zeichnen sich schon heute ab.
Coronaparty, Jo-jo-Lockdown und Mask-have – Wortschatzerweiterung während des Corona-Stillstands
(2021)
Das Bild von der 'Sprache der DDR' in der alten Bundesrepublik oder: Haben sie so gesprochen?
(2004)
The recognizability of a stretch of conduct as social action depends on details of turn construction as well as the turn’s context. We examine details of turn construction as they enter into actions offering interpretations of prior talk. Such actions either initiate repair or formulate a conclusion from prior talk. We focus on how interpretation markers (das heißt [“that means”] vs. du meinst [“you mean”]) and interpretation formats (phrasal vs. clausal turn completions) each make their invariant contribution to specific interpreting practices. Interpretation marker and turn format go hand in hand, which leads to distinct patterns of interpreting practices: Das heißt+clause is especially apt for formulations, du meinst+phrase for repair. The results suggest that details of turn construction can systematically enter into the constitution of social action. Data are in German with English translation.
Die geltende amtliche Regelung der deutschen Rechtschreibung geht auf einen Kompromiss aus dem Jahre 2006 zurück, der im Bereich der Kommasetzung bei Infinitivgruppen einen neuerlichen Paradigmenwechsel bedeutete: Während für die Vorreformregelung das Konzept des sog. erweiterten Infinitivs konstituierend war und die Reformregelung sich wesentlich auf schreibstilistische Kriterien gründete, bilden die Basis der aktuellen Regelung grammatisch beschreibbare Fallgruppen. Dieser Umstand schon allein, mehr aber der zentrale Auftrag einer Beobachtung des Schreibgebrauchs durch den Rat für deutsche Retschreibung waren der Rahmen für die vorliegende Pilotstudie, in der das freie Schreiben Grundlage einer differenzierten Analyse des Kommagebrauchs bei Infinitivgruppen ist.
Der vorliegende Beitrag skizziert in einem ersten Abschnitt Gegenstandsbereich und kodifizierte Regelung, bevor er im Weiteren das Studiendesign und die Ergebnisse vorstellt. Die Ergebnisse werden nach Fallgruppen sowie im Hinblick auf übergreifende Tendenzen und Beobachtungen besprochen. Sie sind Ausgangspunkt der im Ausblick formulierten Thesen.
Seit 1977 wird in Deutschland jedes Jahr ein Wort bzw. eine Wortsequenz zum „Wort des Jahres“ gekürt. Vorgenommen wird die Wahl von einer Jury, die sich aus Mitgliedern der Gesellschaft für deutsche Sprache (GfdS) zusammensetzt. In der deutschsprachigen Schweiz gibt es eine solche Aktion ebenfalls (seit 2003); inzwischen wird das Wort des Jahres aber nicht mehr nur auf Deutsch, sondern auch auf Französisch, Italienisch und Rätoromanisch gewählt. Wenn im Folgenden vom „Schweizer Wort des Jahres“ die Rede ist, ist damit aber immer nur das Deutschschweizer Jahreswort gemeint. Durchgeführt wird die Aktion von einem Forschungsteam, das an der Zürcher Hochschule für Angewandte Linguistik (ZHAW) tätig ist.
In diesem Beitrag geht es vor allem um die Frage, wie das Smartphone in der Alltagskommunikation als gemeinsamer Bezugspunkt relevant gemacht wird und wie sich die Reaktionen der Interagierenden zum auf dem Display Gezeigten gestalten. Es zeigt sich, dass diese in mehrere responsive Schritte unterteilt werden, in denen die Aufmerksamkeit gebündelt und das Display fokussiert wird sowie eine Abstimmung darüber erfolgt, wie das Gezeigte zu kontextualisieren ist.
In this chapter, we overview the specificity of comparisons made within the perspective of Conversation Analysis (CA), and we position them in relation to other fields. We introduce the analytical mentality, methodology, and procedures of CA, and we show how we used it for the analysis of OKAY in this volume.
Speech islands are historically and developmentally unique and will inevitably disappear within the next decades. We urgently need to preserve their remains and exploit what is left in order to make research on language-in-contact and historical as well as current comparative language research possible.
The Archive for Spoken German (AGD) at the Institute for German Language collects, fosters and archives data from completed research projects and makes them available to the wider research community.
Besides large variation corpora and corpora of conversational speech, the archive already contains a range of collections of data on German speech minorities. The latter will be outlined in this chapter. Some speech island data is already made available through the personal service of the AGD, or the database of spoken German (DGD), e.g. data on Australian German, Unserdeutsch, or German in North America. Some corpora are still being prepared for publication, but still important to document for potentially interested research projects. We therefore also explain the current problems and efforts related to the curation of speech island data, from the digitization of recordings and the collection of metadata, to the integration of transcriptions, annotations and other ways of accessing and sharing data.
We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the orthographic realizations seem to be linked to the degree of expressivity.
Daten und Metadaten
(2022)
In diesem Kapitel werden Metadaten als Daten definiert, die der Dokumentation und/oder Beschreibung empirischer Sprachdaten dienen. Einleitend werden die verschiedenen Funktionen von Metadaten im Forschungsprozess und ihre Bedeutung für die Konzepte der Ausgewogenheit und Repräsentativität diskutiert. Anhand des Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) werden dann Metadaten eines konkreten Korpus vorgestellt, und es wird gezeigt, wie diese bei Korpusanalysen zum Einsatz kommen.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
This article examines a recurrent format that speakers use for defining ordinary expressions or technical terms. Drawing on data from four different languages - Flemish, French, German, and Italian - it focuses on definitions in which a definiendum is first followed by a negative definitional component (‘definiendum is not X’), and then by a positive definitional component (‘definiendum is Y’). The analysis shows that by employing this format, speakers display sensitivity towards a potential meaning of the definiendum that recipients could have taken to be valid. By negating this meaning, speakers discard this possible, yet unintended understanding. The format serves three distinct interactional purposes: (a) it is used for argumentation, e.g. in discussions and political debates, (b) it works as a resource for imparting knowledge, e.g. in expert talk and instructions, and (c) it is employed, in ordinary conversation, for securing the addressee's correct understanding of a possibly problematic expression. The findings contribute to our understanding of how epistemic claims and displays relate to the turn-constructional and sequential organization of talk. They also show that the much quoted ‘problem of meaning’ is, first and foremost, a participant's problem.
Quotation marks are substantially used for direct speech and citations. For the ‘modalizing’ use, the Official Rules state that a “different understanding than usual” is indicated; they give very little information on the use of quotation marks beyond literal reference. It therefore seems all the more interesting to investigate the usage of modalizing quotation marks. In the present analysis, we studied the school-leaving examinations of an entire year. School-leaving examinations are texts by persons whose institutional acquisition of written language can be regarded as complete; they are texts written by skilled writers. The investigation takes into account both formal and functional observations. We recognized differences between school subjects that can be interpreted with regard to the concept of educational language. The writers described here showed a high sensitivity (conscious or unconscious) to the use of quotation marks, which we call the “struggle for educational language”. This may be related to the corpus investigated here. However, our study constitutes a solid basis for further corpus studies on quotation marks.
This paper deals with multiword lexemes (MWLs), focussing on two types of verbal MWLs: verbal idioms and support verb constructions. We discuss the characteristic properties of MWLs, namely nonstandard compositionality, restricted substitutability of components, and restricted morpho-syntactic flexibility, and we show how these properties may cause serious problems during the analysis, generation, and transfer steps of machine translation systems. In order to cope with these problems, MT lexicons need to provide detailed descriptions of MWL properties. We list the types of information which we consider the necessary minimum for a successful processing of MWLs, and report on some feasibility studies aimed at the automatic extraction of German verbal multiword lexemes from text corpora and machine-readable dictionaries.
In this article, we explore the feasibility of extracting suitable and unsuitable food items for particular health conditions from natural language text. We refer to this task as conditional healthiness classification. For that purpose, we annotate a corpus extracted from forum entries of a food-related website. We identify different relation types that hold between food items and health conditions going beyond a binary distinction of suitability and unsuitability and devise various supervised classifiers using different types of features. We examine the impact of different task-specific resources, such as a healthiness lexicon that lists the healthiness status of a food item and a sentiment lexicon. Moreover, we also consider task-specific linguistic features that disambiguate a context in which mentions of a food item and a health condition co-occur and compare them with standard features using bag of words, part-of-speech information and syntactic parses. We also investigate in how far individual food items and health conditions correlate with specific relation types and try to harness this information for classification.
Dieser Beitrag stellt den Aufbau eines multimodalen Korpus zur Erforschung des Deutschen als Minderheitssprache in Argentinien vor (DiA). In dem sich im Aufbau befindlichen DiA-Korpus werden die heutige wie auch die historische Situation mit multimodalen (mündlichen, schriftlichen und visuellen) Datensätzen repräsentiert, die mit entsprechenden methodischen Zugängen erfasst wurden und werden. Dazu gehören fragebogengeleitete Interviews (mündliches Medium), Briefe und elizitierte Schriftzeugnisse (geschriebenes Medium) sowie Linguistic-Landscape-Bilddaten (visuelles Medium). In diesem Beitrag wird zunächst ein Überblick über die Forschungssituation zum Deutschen als Minderheitensprache in Argentinien gegeben. Kern des Beitrags ist dann die Vorstellung der Korpusstruktur und des Vorgehens beim Korpusaufbau sowie die Darstellung von Auswertungspotentialen des Datenfundus auf systemischer, soziolinguistischer, sprachideologischer und kontaktlinguistischer Ebene. Eine Methodenreflexion rundet den Beitrag ab.
Dieser Beitrag vergleicht die Ansätze ,Linguistic Landscapes' (LL) und ,Spot German' (SG) in Hinblick auf ihr Potenzial für die Untersuchung des Vorkommens und der Funktionen der deutschen Sprache in Regionen außerhalb des deutschsprachigen Kerngebietes. Als Beispiele wurden eine LL-Studie im Baltikum sowie eine SG-Untersuchung auf Zypern gewählt. Der Vergleich zeigt, dass beide Methoden - trotz ihrer unterschiedlichen Präzision - ähnliche Aussagen zur Rolle des Deutschen erlauben: In beiden Ländern erscheint Deutsch als „Ergänzungssprache“ zu den gesellschaftlichen Hauptsprachen in bestimmten Nischen, z.B. im Tourismus und in Verbindung mit bestimmten Firmen und Produkten.
Sprache ist ein zentraler Bestandteil menschlicher Kommunikation und dient, neben anderen Funktionen, der Etablierung und Gestaltung sozialer Beziehungen, dem Ausdruck von Macht, von Gruppenzugehörigkeit und Identität, aber auch von Ab- und Ausgrenzung, im Privaten wie im Öffentlichen und Politischen. In diesem Beitrag wird der Blick auf den Umgang mit Sprache im deutsch-kolonialen Kontext gerichtet: Es geht darum, wir durch Vorgaben zum Gebrauch von Sprache(n) und deren variable Umsatzung vor Ort das Deutsche Kaiserreich als Kolonialmacht in den Kolonialgebieten in Ozeanien präsent war und repräsentiert wurde.
Question Answering Systems for retrieving information from Knowledge Graphs (KG) have become a major area of interest in recent years. Current systems search for words and entities but cannot search for grammatical phenomena. The purpose of this paper is to present our research on developing a QA System that answers natural language questions about German grammar.
Our goal is to build a KG which contains facts and rules about German grammar, and is also able to answer specific questions about a concrete grammatical issue. An overview of the current research in the topic of QA systems and ontology design is given and we show how we plan to construct the KG by integrating the data in the grammatical information system Grammis, hosted by the Leibniz-Institut für Deutsche Sprache (IDS). In this paper, we describe the construction of the initial KG, sketch our resulting graph, and demonstrate the effectiveness of such an approach. A grammar correction component will be part of a later stage. The paper concludes with the potential areas for future research.
When collecting linguistic data using translation tasks, stimuli can be presented in written or in oral form. In doing so, there is a possibility that a systematic source of error can occur that can be traced back to the selected survey method and which can influence the results of the translation tasks. This contribution investigates whether and to what extent both of the aforementioned survey methods result in divergent results when using translation tasks. For this investigation, 128 informants provided linguistic data; each informant had to translate 25 Wenker sentences from Standard German into either East Swabian, Lechrain or West Central Bavarian dialect, as the case may be. The results show two tendencies. First, written stimuli lead to a slightly higher number of dialectal translation in segmental variables. Second, when oral stimuli are used, syntactic and lexical variables are translated significantly more often in such a manner that they diverge from the template. The results can be explained in terms of varying cognitive processing operations and the constraints of human working memory. When collecting data in the future, these tendencies should be taken into account.
Linguistic Landscapes (LL) sind in der internationalen Soziolinguistik und verwandten Disziplinen in aller Munde. Seit Mitte der 2000er Jahre sind Studien, die sich als Teil dieses Ansatzes verstehen, wie Pilze aus dem Boden geschossen. Seit 2008 hat es in fast jährlichem Rhythmus gut besuchte Tagungen gegeben, die sich ausschließlich mit Linguistic Landscapes beschäftigen - sowohl mit Fallstudien aus aller Welt als auch mit theoretischen und methodologischen Fragen. Folgerichtig sind nicht nur eine Vielzahl von Einzelaufsätzen erschienen, es hat auch mehrere Sammelveröffentlichungen gegeben, und seit 2015 erscheint ein eigenes Journal unter dem Titel „Linguistic Landscapes“ (vgl. Gorter 2013 für einen Überblick über die Entwicklung des Ansatzes).
Obwohl auch Wissenschaftler, die im deutschsprachigen Raum tätig sind, sich in den letzten Jahren den Linguistic Landscapes gewidmet haben, hat die Methode in deutschsprachigen Publikationen jedoch bisher nur einen vergleichsweise geringen Stellenwert eingenommen. Dieser Beitrag möchte somit zum einen Grundlagenarbeit leisten, indem er die Idee der Linguistic Landscapes noch einmal vorstellt und seine Entwicklung der vergangenen Jahre nachzeichnet. Zum anderen soll im Kontext dieses Bandes der Nutzen des Ansatzes für die Analyse von Sprachen von Migrantengruppen diskutiert werden. Schließlich wird der Beitrag durch einige Bemerkungen dazu abgerundet, in welchem Maße die Untersuchung von LL einen Nutzwert haben kann, der über wissenschaftliche Kreise hinausgeht. Grundlage für diesen Beitrag sind internationale Veröffentlichungen der letzten Jahre, vor allem aber gehen Erfahrungen aus eigenen Studien mit ein, die wir seit 2007 mit unterschiedlichen Zielsetzungen im Baltikum und in Deutschland durchgeführt haben.
Der Beitrag widmet sich den Geflüchteten als Teil der deutschlernenden Teilnehmer/innen in den staatlich verordneten Integrationskursen (IKs). Unsere Erhebung unter 305 Geflüchteten aus Syrien und anderen Ländern legt ihren Schwerpunkt auf die sprachlichen Hintergründe. Dabei werden soziodemografische Daten mit Angaben zum Spracherwerb in Beziehung gesetzt und als kollektive Sprachbiografien dargestellt. Des Weiteren beschreiben wir sieben Teilnehmergruppen von Geflüchteten in den IKs, die sich vor allem auf Grund der Faktoren Alter, Bildungsgrad und Arbeitserfahrung unterscheiden, für die aber auch Merkmale im Hinblick auf Herkunft und Mehrsprachigkeit eine Rolle spielen. Ferner werden Angaben zur Sozialsituation in Deutschland mit Einschätzungen zum Deutscherwerb in Beziehung gesetzt. Ein Vergleich mit anderen Studien verdeutlicht die Verschiebungen in der Zusammensetzung des IK. Unser Beitrag kann als Anregung verstanden werden, die Passgenauigkeit im Sinne der Deutschlernenden zu überdenken.
Dieser Beitrag widmet sich der Analyse des Zusammenspiels sprachlich-hörbarer und sichtbar-kinesischer Praktiken, die beim alltäglichen Erzählen eingesetzt werden. Im Rahmen einer konversationsanalytisch basierten Untersuchung von Videoaufnahmen deutscher Alltagsgespräche wird die Bandbreite alltäglicher narrativer Praktiken in der face-to-face-Kommunikation aufgezeigt. Dies erfolgt exemplarisch anhand zweier Beispiele, in denen Einstieg, Ausgestaltung sowie Beendigung der Erzählung unter unterschiedlichen sequentiellen und multimodalen Bedingungen vollzogen werden. Die Untersuchung unterstreicht einerseits die Indexikalität alltäglicher narrativer Praktiken, andererseits die Notwendigkeit einer interaktionalen Narratologie, die diese Praktiken als Produkt sprachlicher, verkörperter und räumlicher Ressourcen sowie der Zusammenarbeit mehrerer Teilnehmer analysiert und konzeptualisiert.
Different Views on Markup
(2010)
In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.
Auch Linguist*innen, die gesprochene Sprache untersuchen, kommen schon seit längerem nicht mehr ohne digitale Infrastrukturen aus. Seit Beginn der Gesprochene-Sprache-Forschung werden Gespräche aufgezeichnet und anschließend transkribiert, da die flüchtigen, innerhalb von Bruchteilen von Sekunden stattfindenden Feinheiten des Gesprochenen paradoxerweise nur durch Verschriftung im Detail untersucht werden können. Diese Detailuntersuchungen beschränkten sich im vergangenen Jahrhundert meist auf wenige Einzelbelege für ein untersuchtes Phänomen. Das heißt, die Forschenden hatten den unmittelbaren Überblick über ihre Datenkollektionen und benötigten keine elaborierten digitalen Methoden zu deren Aufbereitung, Annotation und Analyse. Dies hat sich in den letzten beiden Jahrzehnten stark geändert: Es wurden vermehrt gezielt große Datenmengen gesammelt, in Datenbanken organisiert und der Forschungsgemeinschaft zur Nutzung zur Verfügung gestellt. An erster Stelle muss hier das Forschungs- und Lehrkorpus gesprochenes Deutsch (FOLK) genannt werden (vgl. Schmidt 2014). Dieses wird seit 2008 am Leibniz-Institut für Deutsche Sprache (IDS) aufgebaut und ist heute das größte Referenzkorpus für das gesprochene Deutsch.
Digressions
(2015)
Der Beitrag von Bruno Strecker Digressions ist auf Französisch geschrieben (der Muttersprache von Jacqueline Kubczak) und handelt von unterschiedlichen Exkursen. Er macht die Verbindung zwischen Kommunikationssituation und Arten der Exkurse sichtbar und bietet eine darauf basierende Typologie der Exkurse an. In einem zweiten Schritt werden die formalen Möglichkeiten, einen Exkurs einzuleiten und zu formulieren, dargestellt (z. B. durch Appositionen, Parenthesen, festgelegte Ausdrucksformen wie A propos xxx, Ça me rappelle oder nicht eingebettete Phrasen). Schließlich zeigt er, wie man aus dem Exkurs wieder „in die Spur“ kommt.
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.
Der Beitrag spürt dem spannungsreichen Verhältnis von diskursanalytischen Ansätzen und (neo-)marxistischer Kapitalismuskritik nach und erkundet mögliche Beiträge diskursanalytischer Perspektiven zu Kapitalismusanalysen. In einem ersten Schritt wird anhand einiger ausgewählter Diskurstheoretikerinnen und -theoretiker der Eindruck einer zwischen affirmierter Nähe und skeptischer Abgrenzung schwankenden Positionierung zu marxistischen Ansätzen verdeutlicht. Gegen elementare Grundannahmen marxistischer Wissenschafts- und Gesellschaftskonzepte, so etwa den Begriff der ‚Ideologie‘ oder die Annahme einer klar nachvollziehbaren und damit voraussagbaren gesellschaftlich-politischen Determinierung durch ökonomische ‚Basisprozesse‘ setzten sie die Ansicht, dass Wissen, Wahrheit, soziale Identitäten wie auch gesellschaftliche Praktiken als kontingente und stets unabgeschlossene Ergebnisse sozialer Konstruktionsprozesse zu begreifen seien. Am Beispiel verschiedener marxistischer Grundannahmen, wie der Trennung von Lohnarbeit und Kapital, dem Verwertungszwang des Kapitals, dem Auseinanderfallen von Politik und Ökonomie, wird anschließend dafür plädiert, diese nicht als gegebene Tatsachen hinzunehmen, sondern in ihrer diskursiven Verfasstheit selbst zu untersuchen. Erst dann – so die Annahme – lässt sich zeigen, ob und wie diese Elemente gesellschaftlich wirkmächtig werden.
Two very reliable influences on eye fixation durations in reading are word frequency, as measured by corpus counts, and word predictability, as measured by cloze norming. Several studies have reported strictly additive effects of these 2 variables. Predictability also reliably influences the amplitude of the N400 component in event-related potential studies. However, previous research suggests that while frequency affects the N400 in single-word tasks, it may have little or no effect on the N400 when a word is presented with a preceding sentence context. The present study assessed this apparent dissociation between the results from the 2 methods using a coregistration paradigm in which the frequency and predictability of a target word were manipulated while readers’ eye movements and electroencephalograms were simultaneously recorded. We replicated the pattern of significant, and additive, effects of the 2 manipulations on eye fixation durations. We also replicated the predictability effect on the N400, time-locked to the onset of the reader’s first fixation on the target word. However, there was no indication of a frequency effect in the electroencephalogram record. We suggest that this pattern has implications both for the interpretation of the N400 and for the interpretation of frequency and predictability effects in language comprehension.
Linguistic relativists have traditionally asked 'how language influences thought', but conversation analysts and anthropological linguists have moved the focus from thought to social action. We argue that 'social action' should in this context not become simply a new dependent variable, because the formulation 'does language influence action' suggests that social action would already be meaningfully constituted prior to its local (verbal and multi-modal) accomplishment. We draw on work by the gestalt psychologist Karl Duncker to show that close attention to action-in-a-situation helps us ground empirical work on cross-cultural diversity in an appreciation of the invariances that make culture-specific elements of practice meaningful.