Refine
Year of publication
- 2016 (347) (remove)
Document Type
- Part of a Book (136)
- Article (104)
- Conference Proceeding (51)
- Book (33)
- Part of Periodical (12)
- Working Paper (5)
- Doctoral Thesis (3)
- Other (2)
- Preprint (1)
Keywords
- Deutsch (113)
- Korpus <Linguistik> (47)
- Gesprochene Sprache (31)
- Konversationsanalyse (24)
- Wörterbuch (22)
- Interaktion (20)
- Computerunterstützte Lexikographie (19)
- Linguistik (17)
- Diskursanalyse (16)
- Kommunikation (15)
Publicationstate
- Veröffentlichungsversion (169)
- Zweitveröffentlichung (35)
- Postprint (17)
- Erstveröffentlichung (1)
Reviewstate
Publisher
- Institut für Deutsche Sprache (45)
- de Gruyter (34)
- De Gruyter (23)
- Winter (19)
- European Language Resources Association (ELRA) (13)
- Narr Francke Attempto (12)
- Retorika (8)
- Peter Lang (7)
- Linssen Druckcenter (6)
- Association for Computational Linguistics (5)
"Kaum [...] da, wird' ich gedisst!" Funktionale Aspekte des Banter-Prinzips auf dem Online-Prüfstand
(2016)
The article is to be considered as an attempt to enrich the theoretical approach of the Banter-Principle (Leech 1983) with an online point of view. Examples from Teamspeak- conversations and comments on the social network site Facebook reveal different user practices regarding the identifiability of the Banter-Principle: Nonverbal elements or emoticons in order to make sure that Banter is understood correctly in written language on the one hand; coping with assigned roles depending on dynamic group internal hierarchies in oral communication on the other hand. Nevertheless one question remains. Why should one disguise a cordial message rudely? My analysis shows two functions of Online Banter. Firstly, maximize the entertainment value of a conversation and secondly, establish an accepted online-identity.
'Faction' im Fernsehen - Produktionsbeobachtung des Scripted Reality-Formats mieten, kaufen, wohnen
(2016)
The present investigation targets the phenomenon commonly called control. Many languages including German and Polish employ non-finite clauses (besides finite clauses) as propositional complements. The subject of these complement clauses is left unexpressed and must generally be interpreted co-referentially with the subject or object of the matrix clause (subject or object control). However. there are also infinitive-selecting verbs that do not allow for a co- referential interpretation of the embedded subject - semantically, the embedded infinitives of these anti-control verbs are thus less dependent on or less unifiable with the matrix proposition. In Polish anti-control constructions, non-finite complements are overtly marked with the complementizer zeby, suggesting that they are structurally more complex (namely. containing a C-projection) than the non-finite complements in control constructions lacking zeby (modulo special contexts. viz. 'control switch'). In a comparative perspective, the paper brings corpuslinguistic and experimental evidence to bear on the question whether surface appearances notwithstanding, the infinitival complements of anti-control verbs in German should similarly be analyzed as truly sentential, i.e., C-headed structures.
The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.
A comparison between morphological complexity measures: typological data vs. language corpora
(2016)
Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.
There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit and other achieves good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
The paper deals with the use of ICH WEIß NICHT (‘I don’t know’) in German talk-in-interaction. Pursuing an Interactional Linguistics approach, we identify different interactional uses of ICH WEIß NICHT and discuss their relationship to variation in argument structure (SV (O), (O)VS, V-only). After ICH WEIß NICHT with full complementation, speakers emphasize their lack of knowledge or display reluctance to answer. In contrast, after variants without an object complement, in contrast, speakers display uncertainty about the truth of the following proposition or about its sufficiency as an answer. Thus, while uses with both subject and object tend to close a sequence or display lack of knowledge, responses without an object, in contrast, function as a prepositioned epistemic hedge or a pragmatic marker framing the following TCU. When ICH WEIß NICHT is used in response to a statement, it indexes disagreement (independently from all complementation patterns).
Our paper deals with the use of ICH WEIß NICHT (‘I don’t know’) in German talk-in-interaction. Pursuing an Interactional Linguistics approach, we identify different interactional uses of ICH WEIß NICHT and discuss their relationship to variation in argument structure (SV (O), (O)VS, V-only). After ICH WEIß NICHT with full complementation, speakers emphasize their lack of knowledge or display reluctance to answer. In contrast, after variants without an object complement, in contrast, speakers display uncertainty about the truth of the following proposition or about its sufficiency as an answer. Thus, while uses with both subject and object tend to close a sequence or display lack of knowledge, responses without an object, in contrast, function as a prepositioned epistemic hedge or a pragmatic marker framing the following TCU. When ICH WEIß NICHT is used in response to a statement, it indexes disagreement (independently from all complementation patterns).
This study investigates high vowel laxing in the Louisiana French of the Lafourche Basin. Unlike Canadian French, in which the high vowels /i, y, u/ are traditionally described as undergoing laxing (to [I, Y, U]) in word-final syllables closed by any consonant other than a voiced fricative (see Poliquin 2006), Oukada (1977) states that in the Louisiana French of Lafourche Parish, any coda consonant will trigger high vowel laxing of /i/; he excludes both /y/ and /u/ from his discussion of high vowel laxing. The current study analyzes tokens of /i, y, u/ from pre-recorded interviews with three older male speakers from Terrebonne Parish. We measured the first and second formants and duration for high vowel tokens produced in four phonetic environments, crossing syllable type (open vs. closed) by consonant type (voiced fricative vs. any consonant other than a voiced fricative). Results of the acoustic analysis show optional laxing for /i/ and /y/ and corroborate the finding that high vowels undergo laxing in word-final closed syllables, regardless of consonant type. Data for /u/ show that the results vary widely by speaker, with the dominant pattern (shown by two out of three speakers) that of lowering and backing in the vowel space of closed syllable tokens. Duration data prove inconclusive, likely due to the effects of stress. The formant data published here constitute the first acoustic description of high vowels for any variety of Louisiana French and lay the groundwork for future study on these endangered varieties.
American English and German AI, AU observed in cognates such as Wein, wine, Haus, house are usually treated on a par, represented with the same initial vowel (cf. [ai], [au] for Am. Engl, and German [1]). Yet, acoustic measurements indicate differences as the relevant trajectories characteristically cross in Am. Engl, but not in German. These data may indicate consistency with the same initial target for these diphthongs in German, supporting the choice of the same Symbol /a/ in phonemic representation, as opposed to distinct targets (and distinct initial phonemes) in American English.
Aktuelle Änderungen des Rats für deutsche Rechtschreibung 2016 - Hintergründe und Begründungen
(2016)
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
A model of grammar needs to reconcile the undesirability inherent to allomorphy, the apparent extra burden on learning and memory, with its occurrence and possible stability. OT approaches this task by positing an anti-allomorphy constraint, henceforth referred to as "OO-correspondence", which requires leveling (i.e. sameness of sound structure) in related word forms (Benua 1997). The occurrence of allomorphy then indicates crucial domination of OO-correspondence by other constraints. To assess the adequacy of this proposal it is necessary to establish the level of abstractness at which OO-correspondence applies and to examine the consequences of this decision for ranking order. While proponents of OT tacitly assume the level in question to be rather concrete, the notion of allomorphy as originally envisioned in Structuralism was defined by distinctness at a more abstract level referred to as "phonemic" (Harris 1942; Nida 1944). The basic intuition here is that the defining property of subphonemic sound properties, their conditionedness by context, entails that whatever burden they put on learning and memory is of a fundamentally different nature than that entailed by phonemic distinctness. The evidence from German supports that intuition in that leveling can be shown to target phonemic sound structure to the exclusion of subphonemic properties. Allomorphy, defined by phonemic alterna-tion, tends to serve phonological optimization in closed class items (function words, affixes) while serving to express morphological distinctions in open class items. The key to demonstrating the correlations in question lies in the discernment of phonemic structure, which is therefore at the core of the article.
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
Analepsen mit Topik-Drop sind hochfrequente sprachliche Strukturen in Interaktionen. In dieser Arbeit stehen neben der interaktionslinguistischen Untersuchung der Diskursfunktionen, Bedingungen und Restriktionen von Analepsen diskurssemantische Perspektiven und Fragestellungen im Mittelpunkt, insbesondere die detaillierte Beschreibung der semantischen Relationen zwischen Analepsen und ihrem Präkontext. Die Analepsenresolution muss dabei situiert erklärt werden, da das Verstehen von Analepsen von der kontextuellen Einbettung sowie von grammatischen, semantischen und pragmatischen Merkmalen der Äußerung abhängt.
Es wird gezeigt, dass kognitive Zuschreibungen hinsichtlich der Interaktionsbeteiligten auch mit interaktionslinguistischen Methoden möglich sind. Die Studie demonstriert außerdem, dass die Kombination von qualitativen und quantitativen Methoden erkenntnisträchtig ist, um spezifische Verwendungspräferenzen von analeptischen im Vergleich zu anaphorischen Äußerungen herauszuarbeiten.
This thesis consists of the following three papers that all have been published in international peer-reviewed journals:
Chapter 3: Koplenig, Alexander (2015c). The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Data Sets—Reconstructing the Composition of the German Corpus in Times of WWII. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv037]
Chapter 4: Koplenig, Alexander (2015b). Why the quantitative analysis of dia-chronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions. Published in: Digital Scholarship in the Humanities. Oxford: Oxford University Press. [doi:10.1093/llc/fqv030]
Chapter 5: Koplenig, Alexander (2015a). Using the parameters of the Zipf–Mandelbrot law to measure diachronic lexical, syntactical and stylistic changes – a large-scale corpus analysis. Published in: Corpus Linguistics and Linguistic Theory. Berlin/Boston: de Gruyter. [doi:10.1515/cllt-2014-0049]
Chapter 1 introduces the topic by describing and discussing several basic concepts relevant to the statistical analysis of corpus linguistic data. Chapter 2 presents a method to analyze diachronic corpus data and a summary of the three publications. Chapters 3 to 5 each represent one of the three publications. All papers are printed in this thesis with the permission of the publishers.
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
(2016)
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.
Im Verlauf der Geschehnisse in der arabischen Welt seit 2011 gewann der Begriff Arabischer Frühling an Bedeutung und avancierte zum Leitausdruck des Diskurses. Der Beitrag geht den Fragen nach, wie der Begriff Arabischer Frühling in der deutschsprachigen Öffentlichkeit sprachlich realisiert, mit welchen sprachlichen Mitteln er konstruiert und mit welchen Ereignissen – zuweilen auch Katastrophen – er identifiziert wurde bzw. wird. Dabei wird auf die symbolische Funktion des Frühlings sowohl aus historischer Perspektive der Vormärzzeit als auch aus heutiger Sicht eingegangen. Im Blickfeld der Untersuchung stehen darüber hinaus die Jahreszeitenbezeichnungen Winter, Herbst und Sommer und ihr symbolisches Verhältnis zu den arabischen Revolutionen.
In their analysis of methods that participants use to manage the realization of practical courses of action, Kendrick and Drew (2016/this issue) focus on cases of assistance, where the need to be addressed is Self’s, and Other lends a helping hand. In our commentary, we point to other forms of cooperative engagement that are ubiquitously recruited in interaction. Imperative requests characteristically expect compliance on the grounds of Other’s already established commitment to a wider and shared course of actions. Established commitments can also provide the engine behind recruitment sequences that proceed nonverbally. And forms of cooperative engagement that are well glossed as assistance can nevertheless be demonstrably oriented to established commitments. In sum, we find commitment to shared courses of action to be an important element in the design and progression of certain recruitment sequences, where the involvement of Other is best defined as contribution. The commentary highlights the importance of interdependent orientations in the organization of cooperation. Data are in German, Italian, and Polish.
Wiegand’s opus magnum „Wörterbuchforschung“ ends with a chapter on the state and the relevant taslcs for research into dictionary use in the middle of the 1990s. This article aims at reflecting the taste and the relevance of dictionary usage research 20 years later. I will argue that the fundamentally changed lexicographic landscape makes it necessary to shift the focus of research. In my view, the most important aim of research into dictionary use can no longer be limited to improving dictionaries. Research into dictionary use should also raise more awareness for user- orientation in general and should provide methodological reflection to enlighten the increasingly important usage statistics for online dictionaries. Another goal should be to look behind the scenes of collaborative dictionaries in order to provide background data to classify their relevance in relation to dictionaries elaborated by lexicographic experts. The crisis of lexicography makes it also necessary to broaden our view and concentrate on situations in which linguistic questions arise. In this context, we could examine in which of these situations the consultation of lexicographic data helps. In summary, the aim of research into dictionary use is to identify the fields where sound lexicographic work is really helpful for potential users.
In this paper, we describe preliminary results from an ongoing experiment wherein we classify two large unstructured text corpora—a web corpus and a newspaper corpus—by topic domain (or subject area). Our primary goal is to develop a method that allows for the reliable annotation of large crawled web corpora with meta data required by many corpus linguists. We are especially interested in designing an annotation scheme whose categories are both intuitively interpretable by linguists and firmly rooted in the distribution of lexical material in the documents. Since we use data from a web corpus and a more traditional corpus, we also contribute to the important field of corpus comparison and corpus evaluation. Technically, we use (unsupervised) topic modeling to automatically induce topic distributions over gold standard corpora that were manually annotated for 13 coarse-grained topic domains. In a second step, we apply supervised machine learning to learn the manually annotated topic domains using the previously induced topics as features. We achieve around 70% accuracy in 10-fold cross validations. An analysis of the errors clearly indicates, however, that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.
Smiling individuals are usually perceived more favorably than non-smiling ones—they are judged as happier, more attractive, competent, and friendly. These seemingly clear and obvious consequences of smiling are assumed to be culturally universal, however most of the psychological research is carried out in WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic) and the influence of culture on social perception of nonverbal behavior is still understudied. Here we show that a smiling individual may be judged as less intelligent than the same non-smiling individual in cultures low on the GLOBE’s uncertainty avoidance dimension. Furthermore, we show that corruption at the societal level may undermine the prosocial perception of smiling—in societies with high corruption indicators, trust toward smiling individuals is reduced. This research fosters understanding of the cultural framework surrounding nonverbal communication processes and reveals that in some cultures smiling may lead to negative attributions.
Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim
(2016)
Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim
(2016)
Bild-Makros, auch unter dem Begriff Memes bekannt, sind populäre Internetphänomene, die im Zuge der umfassenden Multimodalisierung der Medienkommunikation als Unterhaltungsangebote auf Facebook verbreitet und kommentiert werden. Dieser Beitrag betrachtet diese aus einer Kombination von Bild und Text bestehenden multimodalen Kommunikate aus einer gattungs- und gesprächsanalytischen Perspektive, da Bild- Makros sowohl in ihrer formalen und semantischen Gestaltung als auch in der interaktiven Rezeption in Form von Kommentaren und Antworten verfestigte Muster aufzuweisen scheinen. In dieser medial vermittelten Interaktion haben sich sowohl auf der strukturellen Ebene der Interaktionssequenzen als auch innerhalb einzelner, auf sequenzexterner und sequenzinterner Ebene analysierten Interaktionseinheiten verschiedene kommunikative Muster herausgebildet. Darin nehmen soziale Prozesse wie face-work und Identitätskonstruktion Einfluss auf die interaktive Aushandlung des Kommunikats.
Brown clustering has been used to help increase parsing performance for morphologically rich languages. However, much of the work has focused on using clustering techniques to replace terminal nodes or as a feature for parsing. Instead, we choose to examine how effectively Brown clustering is for unlexicalized parsing by creating data-driven POS tagsets which are then used with the Berkeley parser. We investigate cluster sizes as well as on what information (e.g. words vs. lemmas) clustering will yield the best parser performance. Our results approach the current state of the art results for the German T¨uBa-D/Z treebank when using parser internal tagging.
This paper presents C-WEP, the Collection of Writing Errors by Professionals Writers of German. It currently consists of 245 sentences with grammatical errors. All sentences are taken from published texts. All authors are professional writers with high skill levels with respect to German, the genres, and the topics. The purpose of this collection is to provide seeds for more sophisticated writing support tools as only a very small proportion of those errors can be detected by state-of-the-art checkers. C-WEP is annotated on various levels and freely available.
German research on collocation(s) focuses on many different aspects. A comprehensive documentation would be impossible in this short report. Accepting that we cannot do justice to all the contributions to this area, we just pick out some influential comerstones. This selection does not claim to be representative or balanced, but it follows the idea to constitute the backbone of the story we want to tell: Our ‘German’ view of the still ongoing evolution of a notion of ‘collocation’ Although our own work concerns the theoretical background of and the empirical rationale for collocations, lexicography occupies a large space. Some of the recent publications ( Wahrig 2008, Häcki Buhofer et al. 2014) represent a turn to the empirical legitimation for the selection of typical expressions. Nevertheless, linking the empirical evidence to the needs of an abstract lexicographic description (or a didactic format) is still an open issue.
Comparaison de deux marqueurs d’affirmation dans des séquences de co-construction: voilà et genau
(2016)
This contribution investigates the German response particle genau and the French response particle voilà within collaborative turn sequences in videotaped ordinary conversations. Adopting a conversation analytic approach to cross-linguistic comparison, I will show that the basic epistemic value of both particles allows them to be used in similar sequential environments. When a co-participant formulates a candidate conclusion in environments where it can be easily inferred from previous talk, first speakers may confirm the adequacy of the pre-emptive completion by voilà or genau. These particles may then also be followed by self- or other-repeats. The analyses aim to illustrate that participants rely on a variety of practices in order to positively assess a pre-emptive completion, and to refute a supposed binary opposition of refusal vs. acceptance in the receipt slot.
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g., title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
Constructing a Corpus
(2016)
This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.
Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D
(2016)
The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.
The present paper describes Corpus Query Lingua Franca (ISO CQLF), a specification designed at ISO Technical Committee 37 Subcommittee 4 “Language resource management” for the purpose of facilitating the comparison of properties of corpus query languages. We overview the motivation for this endeavour and present its aims and its general architecture. CQLF is intended as a multi-part specification; here, we concentrate on the basic metamodel that provides a frame that the other parts fit in.
This paper presents our model of ‘MultiWord Patterns’ (MWPs). MWPs are defined as recurrent frozen schemes with fixed lexical components and productive slots that have a holistic – but not necessarily idiomatic – meaning and/or function, sometimes only on an abstract level. These patterns can only be reconstructed with corpus-driven, iterative (qualitative-quantitative) methods. This methodology includes complex phrase searches, collocation analysis that not only detects significant word pairs, but also significant syntagmatic cotext patterns and slot analysis with our UWV Tool. This tool allows us to bundle KWICs in order to detect the nature of lexical fillers for and to visualize MWP hierarchies.
In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.
The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.
Zweisprachige Neologismenwörterbücher, die den neuen Wortschatz der Ausgangssprache eines bestimmten Zeitraumes erfassen und Bedeutungserklärungen und/oder Äquivalente in der Zielsprache anbieten, können dem Deutschlerner beim Sprachenerwerb eine große Hilfe sein. Sie präsentieren den Wortschatz, der in zweisprachigen Gesamtwörterbüchern in der Regel noch nicht erfasst ist, und unterstützen damit den Lerner bei der Textrezeption. Auch für die Textproduktion sind sie geeignet, wenn der Darstellung von Bedeutung und Gebrauch angemessen Raum gegeben wird. Diese Möglichkeiten werden am Beispiel des Deutsch-russischen Neologismenwörterbuches erläutert. Das Wörterbuch umfasst den Zeitraum 1991 – 2010. Es ist mit seinen knapp 2000 Stichwörtern für den neuen Wortschatz im Deutschen primär als passives Wörterbuch angelegt, d.h. es richtet sich in erster Linie an deutschlernende bzw. -beherrschende russischsprachige Benutzer. Es bietet zwei Vorteile: Zum einen finden die Benutzer hier den neuen Wortschatz, den sie in allgemeinen zweisprachigen Wörterbüchern in der Regel vergeblich suchen. Zum anderen ist dem allgemein großen Informationsbedarf durch eine explizite Beschreibung Rechnung getragen, weil das Platzangebot hier aufgrund der – im Vergleich zu einem allgemeinsprachlichen Gesamtwörterbuch – geringeren Stichwortzahl relativ großzügig bemessen ist. Die Spezifika des Wortartikelaufbaus, die auch durch den besonderen Charakter des zweisprachigen Neologismenwörterbuches bestimmt sind, werden näher erläutert. Die Autoren haben die Erwartung, dass das zweisprachige Neologismenwörterbuch bei den Deutschlernern den Wunsch weckt, Neues im deutschen Wortschatz nachzuschlagen, und dass es dazu beiträgt, die interkulturelle Kompetenz zu fördern.
Datenmodellierung
(2016)
Ausgehend von fundamentalen Einsichten konversationsanalytischer
Interaktionsforschung zum zentralen Stellenwert, den leibliche Kopräsenz und wechselseitige Wahrnehmung für die Ausgestaltung unserer interaktiven Praktiken besitzen, untersucht der Beitrag deiktische Praktiken in der Kommunikation von Angesicht zu Angesicht. Deixis – verbales und gestisches Zeigen für einen Anderen – kann phylo- und ontogenetisch (Tomasello 2003, 2006, 2008) als privilegierte Schnittstelle zwischen Interaktion und Grammatik, zwischen Sprache, menschlichen Körpern, Objekten, Wahrnehmung und Raum betrachtet werden. Auf der Grundlage eines breit angelegten Videokorpus unterschiedlicher Genres werden deiktische Zeigehandlungen als situierte, körpergebundene Praktiken analysiert und systematisch auf transsituative Gemeinsamkeiten und Unterschiede befragt. Die Ergebnisse der empirischen Analysen zur demonstratio ad oculos (dem Zeigen auf Sichtbares, Bühler 1965) und zur Deixis am Phantasma (dem Zeigen auf Unsichtbares, ebd.) werden in einen übergreifenden theoretischen Modell integriert. In dem multimodalen Modell wird Deixis als situierte, die interaktiven, kognitiven und perzeptorischen Ressourcen aller Beteiligten mobilisierende Praxis gemeinsamer Aufmerksamkeitsfokussierung begriffen (Stukenbrock 2015b).
Dependens
(2016)
Dependenzrelation
(2016)
Der vorliegende Aufsatz untersucht die Syntax und Semantik sogenannter Postponierer, d.h. konjunktionaler Konnektoren, die den von ihnen eingeleiteten Nebensatz dem Hauptsatz stets nachstellen. Anhand von sodass und zumal werden die Kerneigenschaften solcher Konnektoren im Deutschen vorgestellt. Am Beispiel der italienischen Konjunktionen cosicché, tanto più che und perché wird diskutiert, ob der Begriff des Postponierers für den Sprachvergleich genutzt werden kann. In einem nächsten Schritt werden die Postponierer des Deutschen unter Beiziehung sprachgeschichtlicher Argumente präziser beschrieben und im Übergangsfeld zwischen Adverbkonnektoren und Subjunktoren verortet. Es zeigt sich, dass die untersuchten Konnektoren sich letztlich sehr unterschiedlich verhalten, sodass es fraglich erscheint, ob ihre Zusammenfassung zu einer gemeinsamen Klasse gerechtfertigt ist.
Der lexikografische Prozess
(2016)
Der lexikografische Prozess ist bislang fast ausschließlich bezogen auf gedruckte Wörterbücher untersucht worden. Bei Internetwörterbüchern (und insbesondere bei solchen, die sich im Aufbau befinden) gestaltet sich dieser Prozess ganz anders: Hier ist kein Nacheinander der einzelnen Herstellungsphasen zu beschreiben, sondern ein permanentes Neben- und Ineinander einzelner Arbeits- schritte. In diesem Zusammenhang stellt sich somit eine ganze Reihe von Fragen, z. B. danach, wie Bearbeitungsteilwortschätze auszuwählen sind oder welchen Einfluss die neuen Möglichkeiten der Datengewinnung aus elektronischen Textkorpora auf den le- xikografischen Prozess haben, welche Software zur Unterstützung lexikografischer Prozesse eingesetzt werden kann oder wie sich all diese Änderungen auf die Nachschlagenden auswirken.
In dem Beitrag wird das 2014 erschienene "Deutsch-russische Neologismenwörterbuch" vorgestellt, das besonders dem russischsprachigen Benutzer den neuen Wortschatz im Deutschen präsentiert, den er in Gesamtwörterbüchern meist vergeblich sucht. Auf einige Datentypen, d. h. Typen lexikografischer Informationen, wird genauer eingegangen, so auf die typischen Verwendungen der Stichwörter, auf die verschiedenartigen Verknüpfungen zwischen den Stichwörtern, auf die obligatorische Bedeutungserklärung und - ausführlich - auf die russischen Äquivalente.
Languages vary in whether or not their future markers are compatible with non-future modal readings (Tonhauser, 2011b). The present paper proposes that this Variation is determined by the aspectual architecture of a given language, more precisely if and how aspects can be stacked. Building on recent accounts of the temporal interpretation of modals (Matthewson, 2012, 2013; Kratzer, 2012; Chen et al., ta), the paper first sketches an analysis of the temporal readings of the English future marker will and then provides cross-linguistic comparison with a selected, typologically diverse set of languages (Medumba, Hausa, Gitksan, and Greek).
Deutsch-russisches Neologismenwörterbuch. Neuer Wortschatz im Deutschen, 1991-2010. Bd. 1 - 2 (A-Z)
(2016)
Dieses Wörterbuch, das auf dem ersten größeren Neologismenwörterbuch für das Deutsche fußt, schließt eine Lücke in der deutsch-russischen Wörterbuchlandschaft: Es präsentiert dem Benutzer den neuen deutschen Wortschatz, den er in anderen Wörterbüchern meist vergeblich sucht. Enthalten sind fast 2000 neue Wörter (z.B. Kletterwald, scrollen), neue feste Wortverbindungen (z.B. etw. in die Tonne treten, der Drops ist gelutscht) und neue Bedeutungen etablierter Wörter (z.B. halbrund, Stolperstein), von denen rund 1350 umfassend lexikografisch beschrieben sind. Die vielen Verknüpfungen zwischen den Stichwörtern ermöglichen Einblicke in die Vernetztheit des neuen Wortschatzes und leisten so einen wichtigen Beitrag für den Wortschatzerwerb.
Deutsches Fremdwörterbuch
(2016)
Beim Kontakt der substandardsprachlichen deutschen Varietäten, die von Aussiedlern der Einwanderungsgeneration aus deutschen Sprachinseln der ehemaligen Sowjetunion mitgebracht wurden, mit der Standardsprache und den binnendeutschen Regionalvarietäten ergeben sich Veränderungen spezifischer Art, wie sie im deutschsprachigen Raum bei einheimischen Dialektsprechern bei der Konvergenz infolge von Standard/Dialekt-Variation nicht vorliegen. Wenn Sprecher aus einer Sprachinsel kommen, dann aktivieren sie im Laufe des Aufenthaltes in Deutschland ihre Variationsmuster auf Grund der dialektalen Vorkenntnisse des Deutschen und weiten ihr Repertoire in den standardsprachlichen und zum Teil auch regionalsprachlichen Bereich des Deutschen aus. Diesem Prozess und seinen Folgen ist die vorliegende Publikation gewidmet.
When collecting linguistic data using translation tasks, stimuli can be presented in written or in oral form. In doing so, there is a possibility that a systematic source of error can occur that can be traced back to the selected survey method and which can influence the results of the translation tasks. This contribution investigates whether and to what extent both of the aforementioned survey methods result in divergent results when using translation tasks. For this investigation, 128 informants provided linguistic data; each informant had to translate 25 Wenker sentences from Standard German into either East Swabian, Lechrain or West Central Bavarian dialect, as the case may be. The results show two tendencies. First, written stimuli lead to a slightly higher number of dialectal translation in segmental variables. Second, when oral stimuli are used, syntactic and lexical variables are translated significantly more often in such a manner that they diverge from the template. The results can be explained in terms of varying cognitive processing operations and the constraints of human working memory. When collecting data in the future, these tendencies should be taken into account.
Linguistic Landscapes (LL) sind in der internationalen Soziolinguistik und verwandten Disziplinen in aller Munde. Seit Mitte der 2000er Jahre sind Studien, die sich als Teil dieses Ansatzes verstehen, wie Pilze aus dem Boden geschossen. Seit 2008 hat es in fast jährlichem Rhythmus gut besuchte Tagungen gegeben, die sich ausschließlich mit Linguistic Landscapes beschäftigen - sowohl mit Fallstudien aus aller Welt als auch mit theoretischen und methodologischen Fragen. Folgerichtig sind nicht nur eine Vielzahl von Einzelaufsätzen erschienen, es hat auch mehrere Sammelveröffentlichungen gegeben, und seit 2015 erscheint ein eigenes Journal unter dem Titel „Linguistic Landscapes“ (vgl. Gorter 2013 für einen Überblick über die Entwicklung des Ansatzes).
Obwohl auch Wissenschaftler, die im deutschsprachigen Raum tätig sind, sich in den letzten Jahren den Linguistic Landscapes gewidmet haben, hat die Methode in deutschsprachigen Publikationen jedoch bisher nur einen vergleichsweise geringen Stellenwert eingenommen. Dieser Beitrag möchte somit zum einen Grundlagenarbeit leisten, indem er die Idee der Linguistic Landscapes noch einmal vorstellt und seine Entwicklung der vergangenen Jahre nachzeichnet. Zum anderen soll im Kontext dieses Bandes der Nutzen des Ansatzes für die Analyse von Sprachen von Migrantengruppen diskutiert werden. Schließlich wird der Beitrag durch einige Bemerkungen dazu abgerundet, in welchem Maße die Untersuchung von LL einen Nutzwert haben kann, der über wissenschaftliche Kreise hinausgeht. Grundlage für diesen Beitrag sind internationale Veröffentlichungen der letzten Jahre, vor allem aber gehen Erfahrungen aus eigenen Studien mit ein, die wir seit 2007 mit unterschiedlichen Zielsetzungen im Baltikum und in Deutschland durchgeführt haben.
Dieser Beitrag fasst die wesentlichen Aussagen und Ergebnisse eines Workshops zusammen, der sieben Perspektiven auf die Untersuchung der Rolle des Deutschen im öffentlichen Raum zusammengebracht hat. Einige der vorgestellten Studien folgten dem seit Beginn der 2000er Jahre rasant an Popularität gewonnenen Ansatz der ‚Linguistic Landscapes‘. In anderen Beiträgen standen praktische Überlegungen zum Suchen von Beispielen der deutschen Sprache im Mittelpunkt, um diese im Kontext von DaF und Auslandsgermanistik sowie der Werbung für die deutsche Sprache einzusetzen. Ziel des Workshops war es, Gemeinsamkeiten und Perspektiven von diesen unter dem Schlagwort ‚Spot German‘ verorteten Studien mit der Linguistic Landscape-Tradition zu eruieren. Länder, aus denen Studien vorgestellt wurden, waren Estland, Lettland, Dänemark, Tschechien, Deutschland, Zypern und Malta.
In dem vorliegende Beitrag haben wir uns zum Anlass genommen, die Rolle der Architektur für Interaktion grundsätzlich zu überdenken und systematisch anzugehen. Daraus ist der folgende Sammelband entstanden. Der Beitrag ergänzt die im Beitrag von Hausendorf/Schmitt (i.d.Bd.) entwickelte Perspektive auf ‘Interaktionsarchitektur’ und ‘Sozialtopografie’ um eine textlinguistische Perspektive: Die für die Interaktionsarchitekturanalyse zentralen ‘interaktionsarchitektonischen Implikationen’ lassen sich in ihrer Charakteristik weiter bestimmen, wenn man sie vor dem Hintergrund der für die Textanalyse zentralen ‘Lesbarkeitshinweise’ als Benutzbarkeitshinweise profiliert.
Diskurs
(2016)
Die linguistische Diskursanalyse untersucht die sprachliche Konstitution, die gesellschaftlichen Effekte und die Wissens- und Machtverhältnisse seriell-öffentlicher Kommunikation. Als breites, für Nachbardisziplinen offenes Forschungsfeld hat sie theoretische und methodische Schnittstellen etwa zur Textlinguistik, Pragmatik und Kognitiven Semantik sowie zur Wissenssoziologie, Literatur- und Geschichtswissenschaft. Die vorliegende Bibliografie führt konzise in Diskurs und Diskursanalyse ein und gibt einen Überblick über die interdisziplinären Bezüge der Forschungsliteratur.
Diskursive Historizität
(2016)
This paper introduces the recently started DRuKoLA-project that aims at providing mechanisms to flexibly draw virtual comparable corpora from the German Reference Corpus DeReKo and the Reference Corpus of Contemporary Romanian Language CoRoLa in order to use these virtual corpora as empirical basis for contrastive linguistic research.
Editorial
(2016)
Sentiment analysis has so far focused on the detection of explicit opinions. However, of late implicit opinions have received broader attention, the key idea being that the evaluation of an event type by a speaker depends on how the participants in the event are valued and how the event itself affects the participants. We present an annotation scheme for adding relevant information, couched in terms of so-called effect functors, to German lexical items. Our scheme synthesizes and extends previous proposals. We report on an inter-annotator agreement study. We also present results of a crowdsourcing experiment to test the utility of some known and some new functors for opinion inference where, unlike in previous work, subjects are asked to reason from event evaluation to participant evaluation.
Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand des Beispiels "sprich" als Diskursmarker bzw. Reformulierungsindikator Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.
Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand vier verschiedener Beispiele Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.
Diese Handreichung stellt die Datenbank für Gesprochenes Deutsch (DGD) und speziell das Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) als Instrumente gesprächsanalytischer Arbeit vor. Nach einem kurzen einführenden Überblick werden anhand des Beispiels metapragmatischer Modalisierungen mit den Adverbien "sozusagen" und "gewissermaßen" und mit der Formel "in Anführungszeichen/-strichen" Schritt für Schritt die Ressourcen und Tools für systematische korpus- und datenbankgesteuerte Recherchen und Analysen vorgestellt und illustriert.