OPUS 4 | Search

"... durch Worte heilen" - Linguistik und Psychotherapie (2016)

Marciniak, Agnieszka ; Nikendei, Christoph ; Ehrenthal, Johannes C. ; Spranz-Fogasy, Thomas

"Interkulturelles Training und Sprachcoaching" für internationale Professor(inn)en an der Technischen Universität München (TUM) (2016)

Rattay-Förstl, Beate

"Kaum [...] da, wird' ich gedisst!" Funktionale Aspekte des Banter-Prinzips auf dem Online-Prüfstand (2016)

Marx, Konstanze

The article is to be considered as an attempt to enrich the theoretical approach of the Banter-Principle (Leech 1983) with an online point of view. Examples from Teamspeak- conversations and comments on the social network site Facebook reveal different user practices regarding the identifiability of the Banter-Principle: Nonverbal elements or emoticons in order to make sure that Banter is understood correctly in written language on the one hand; coping with assigned roles depending on dynamic group internal hierarchies in oral communication on the other hand. Nevertheless one question remains. Why should one disguise a cordial message rudely? My analysis shows two functions of Online Banter. Firstly, maximize the entertainment value of a conversation and secondly, establish an accepted online-identity.

"Niessbrauch an einem Inbegriff von Sachen" - wie versteht der juristische Laie den Wortschatz des BGB? (2016)

Ulrich, Winfried

'Faction' im Fernsehen - Produktionsbeobachtung des Scripted Reality-Formats mieten, kaufen, wohnen (2016)

Klug, Daniel ; Schmidt, Axel

(Anti-)Control in German: evidence from comparative, corpus- and psycholinguistic studies (2016)

Brandt, Patrick ; Trawiński, Beata ; Wöllstein, Angelika

The present investigation targets the phenomenon commonly called control. Many languages including German and Polish employ non-finite clauses (besides finite clauses) as propositional complements. The subject of these complement clauses is left unexpressed and must generally be interpreted co-referentially with the subject or object of the matrix clause (subject or object control). However. there are also infinitive-selecting verbs that do not allow for a co- referential interpretation of the embedded subject - semantically, the embedded infinitives of these anti-control verbs are thus less dependent on or less unifiable with the matrix proposition. In Polish anti-control constructions, non-finite complements are overtly marked with the complementizer zeby, suggesting that they are structurally more complex (namely. containing a C-projection) than the non-finite complements in control constructions lacking zeby (modulo special contexts. viz. 'control switch'). In a comparative perspective, the paper brings corpuslinguistic and experimental evidence to bear on the question whether surface appearances notwithstanding, the infinitival complements of anti-control verbs in German should similarly be analyzed as truly sentential, i.e., C-headed structures.

(Best) Practices for Annotating and Representing CMC and Social Media Corpora in CLARIN-D (2016)

Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

The paper reports the results of the curation project ChatCorpus2CLARIN. The goal of the project was to develop a workflow and resources for the integration of an existing chat corpus into the CLARIN-D research infrastructure for language resources and tools in the Humanities and the Social Sciences (http://clarin-d.de). The paper presents an overview of the resources and practices developed in the project, describes the added value of the resource after its integration and discusses, as an outlook, to what extent these practices can be considered best practices which may be useful for the annotation and representation of other CMC and social media corpora.

4th Workshop on Challenges in the Management of Large Corpora. (May 28th 2016, Portorož; part of the LREC-2016 workshop structure) / LREC 2016, CMLC-4. (2016)

A comparison between morphological complexity measures: typological data vs. language corpora (2016)

Bentz, Christian ; Soldatova, Tatjana ; Koplenig, Alexander ; Samardžić, Tanja

Language complexity is an intriguing phenomenon argued to play an important role in both language learning and processing. The need to compare languages with regard to their complexity resulted in a multitude of approaches and methods, ranging from accounts targeting specific structural features to global quantification of variation more generally. In this paper, we investigate the degree to which morphological complexity measures are mutually correlated in a sample of more than 500 languages of 101 language families. We use human expert judgements from the World Atlas of Language Structures (WALS), and compare them to four quantitative measures automatically calculated from language corpora. These consist of three previously defined corpus-derived measures, which are all monolingual, and one new measure based on automatic word-alignment across pairs of languages. We find strong correlations between all the measures, illustrating that both expert judgements and automated approaches converge to similar complexity ratings, and can be used interchangeably.

A CUP of CoFee: A Large Collection of Feedback Utterances Provided with Communicative Function Annotations (2016)

Prévot, Laurent ; Gorisch, Jan ; Bertrand, Roxane

There have been several attempts to annotate communicative functions to utterances of verbal feedback in English previously. Here, we suggest an annotation scheme for verbal and non-verbal feedback utterances in French including the categories base, attitude, previous and visual. The data comprises conversations, maptasks and negotiations from which we extracted ca. 13,000 candidate feedback utterances and gestures. 12 students were recruited for the annotation campaign of ca. 9,500 instances. Each instance was annotated by between 2 and 7 raters. The evaluation of the annotation agreement resulted in an average best-pair kappa of 0.6. While the base category with the values acknowledgement, evaluation, answer, elicit and other achieves good agreement, this is not the case for the other main categories. The data sets, which also include automatic extractions of lexical, positional and acoustic features, are freely available and will further be used for machine learning classification experiments to analyse the form-function relationship of feedback.

A Discourse-structured Blog Corpus for German: Challenges of Compilation and Annotation (2016)

Suarez, Holger Grumt ; Karlova-Bourbonus, Natali ; Lobin, Henning

The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).

A range of uses of negative epistemic constructions in German: ICH WEIß NICHT as a resource for dispreferred actions (2016)

Helmer, Henrike ; Reineke, Silke ; Deppermann, Arnulf

The paper deals with the use of ICH WEIß NICHT (‘I don’t know’) in German talk-in-interaction. Pursuing an Interactional Linguistics approach, we identify different interactional uses of ICH WEIß NICHT and discuss their relationship to variation in argument structure (SV (O), (O)VS, V-only). After ICH WEIß NICHT with full complementation, speakers emphasize their lack of knowledge or display reluctance to answer. In contrast, after variants without an object complement, in contrast, speakers display uncertainty about the truth of the following proposition or about its sufficiency as an answer. Thus, while uses with both subject and object tend to close a sequence or display lack of knowledge, responses without an object, in contrast, function as a prepositioned epistemic hedge or a pragmatic marker framing the following TCU. When ICH WEIß NICHT is used in response to a statement, it indexes disagreement (independently from all complementation patterns).

A range of uses of negative epistemic constructions in German: ICH WEIß NICHT as a resource for dispreferred actions (2016)

Helmer, Henrike ; Reineke, Silke ; Deppermann, Arnulf

Our paper deals with the use of ICH WEIß NICHT (‘I don’t know’) in German talk-in-interaction. Pursuing an Interactional Linguistics approach, we identify different interactional uses of ICH WEIß NICHT and discuss their relationship to variation in argument structure (SV (O), (O)VS, V-only). After ICH WEIß NICHT with full complementation, speakers emphasize their lack of knowledge or display reluctance to answer. In contrast, after variants without an object complement, in contrast, speakers display uncertainty about the truth of the following proposition or about its sufficiency as an answer. Thus, while uses with both subject and object tend to close a sequence or display lack of knowledge, responses without an object, in contrast, function as a prepositioned epistemic hedge or a pragmatic marker framing the following TCU. When ICH WEIß NICHT is used in response to a statement, it indexes disagreement (independently from all complementation patterns).

Acoustic analysis of high vowels in the Louisiana French of Terrebonne Parish (2016)

Kasper-Cushmann, Kelly ; Dakota, Daniel

This study investigates high vowel laxing in the Louisiana French of the Lafourche Basin. Unlike Canadian French, in which the high vowels /i, y, u/ are traditionally described as undergoing laxing (to [I, Y, U]) in word-final syllables closed by any consonant other than a voiced fricative (see Poliquin 2006), Oukada (1977) states that in the Louisiana French of Lafourche Parish, any coda consonant will trigger high vowel laxing of /i/; he excludes both /y/ and /u/ from his discussion of high vowel laxing. The current study analyzes tokens of /i, y, u/ from pre-recorded interviews with three older male speakers from Terrebonne Parish. We measured the first and second formants and duration for high vowel tokens produced in four phonetic environments, crossing syllable type (open vs. closed) by consonant type (voiced fricative vs. any consonant other than a voiced fricative). Results of the acoustic analysis show optional laxing for /i/ and /y/ and corroborate the finding that high vowels undergo laxing in word-final closed syllables, regardless of consonant type. Data for /u/ show that the results vary widely by speaker, with the dominant pattern (shown by two out of three speakers) that of lowering and backing in the vowel space of closed syllable tokens. Duration data prove inconclusive, likely due to the effects of stress. The formant data published here constitute the first acoustic description of high vowels for any variety of Louisiana French and lay the groundwork for future study on these endangered varieties.

AI vs. AU in American English compared to German (2016)

Raffelsiefen, Renate ; Geumann, Anja

American English and German AI, AU observed in cognates such as Wein, wine, Haus, house are usually treated on a par, represented with the same initial vowel (cf. [ai], [au] for Am. Engl, and German [1]). Yet, acoustic measurements indicate differences as the relevant trajectories characteristically cross in Am. Engl, but not in German. These data may indicate consistency with the same initial target for these diphthongs in German, supporting the choice of the same Symbol /a/ in phonemic representation, as opposed to distinct targets (and distinct initial phonemes) in American English.

Aktuelle Änderungen des Rats für deutsche Rechtschreibung 2016 - Hintergründe und Begründungen (2016)

Güthert, Kerstin

All Your Data Are Belong to us. European Perspectives on Privacy Issues in ‘Free’ Online Machine Translation Services (2016)

Kamocki, Paweł ; Stauch, Marc ; O'Regan, Jim

The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-ﬂows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.

Allomorphy and the question of abstractness: evidence from German (2016)

Raffelsiefen, Renate

A model of grammar needs to reconcile the undesirability inherent to allomorphy, the apparent extra burden on learning and memory, with its occurrence and possible stability. OT approaches this task by positing an anti-allomorphy constraint, henceforth referred to as "OO-correspondence", which requires leveling (i.e. sameness of sound structure) in related word forms (Benua 1997). The occurrence of allomorphy then indicates crucial domination of OO-correspondence by other constraints. To assess the adequacy of this proposal it is necessary to establish the level of abstractness at which OO-correspondence applies and to examine the consequences of this decision for ranking order. While proponents of OT tacitly assume the level in question to be rather concrete, the notion of allomorphy as originally envisioned in Structuralism was defined by distinctness at a more abstract level referred to as "phonemic" (Harris 1942; Nida 1944). The basic intuition here is that the defining property of subphonemic sound properties, their conditionedness by context, entails that whatever burden they put on learning and memory is of a fundamentally different nature than that entailed by phonemic distinctness. The evidence from German supports that intuition in that leveling can be shown to target phonemic sound structure to the exclusion of subphonemic properties. Allomorphy, defined by phonemic alterna-tion, tends to serve phonological optimization in closed class items (function words, affixes) while serving to express morphological distinctions in open class items. The key to demonstrating the correlations in question lies in the discernment of phonemic structure, which is therefore at the core of the article.

An integrated e-dictionary application – the case of an open educational trainer for Zulu (2016)

Faaß, Gertrud ; Bosch, Sonja

This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.

Analepsen in der Interaktion. Semantische und seuqenzielle Eigenschaften von Topik-Drop im gesprochenen Deutsch (2016)

Helmer, Henrike

Analepsen mit Topik-Drop sind hochfrequente sprachliche Strukturen in Interaktionen. In dieser Arbeit stehen neben der interaktionslinguistischen Untersuchung der Diskursfunktionen, Bedingungen und Restriktionen von Analepsen diskurssemantische Perspektiven und Fragestellungen im Mittelpunkt, insbesondere die detaillierte Beschreibung der semantischen Relationen zwischen Analepsen und ihrem Präkontext. Die Analepsenresolution muss dabei situiert erklärt werden, da das Verstehen von Analepsen von der kontextuellen Einbettung sowie von grammatischen, semantischen und pragmatischen Merkmalen der Äußerung abhängt. Es wird gezeigt, dass kognitive Zuschreibungen hinsichtlich der Interaktionsbeteiligten auch mit interaktionslinguistischen Methoden möglich sind. Die Studie demonstriert außerdem, dass die Kombination von qualitativen und quantitativen Methoden erkenntnisträchtig ist, um spezifische Verwendungspräferenzen von analeptischen im Vergleich zu anaphorischen Äußerungen herauszuarbeiten.

Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks (2016)

Rehbein, Ines ; Scholman, Merel ; Demberg, Vera

In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.

Arabischer Frühling oder islamisches Unwetter? Zur Sprachthematisierung des Arabischen Frühlings im öffentlichen Sprachgebrauch (2016)

Saif, Mohammed

Im Verlauf der Geschehnisse in der arabischen Welt seit 2011 gewann der Begriff Arabischer Frühling an Bedeutung und avancierte zum Leitausdruck des Diskurses. Der Beitrag geht den Fragen nach, wie der Begriff Arabischer Frühling in der deutschsprachigen Öffentlichkeit sprachlich realisiert, mit welchen sprachlichen Mitteln er konstruiert und mit welchen Ereignissen – zuweilen auch Katastrophen – er identifiziert wurde bzw. wird. Dabei wird auf die symbolische Funktion des Frühlings sowohl aus historischer Perspektive der Vormärzzeit als auch aus heutiger Sicht eingegangen. Im Blickfeld der Untersuchung stehen darüber hinaus die Jahreszeitenbezeichnungen Winter, Herbst und Sommer und ihr symbolisches Verhältnis zu den arabischen Revolutionen.

Asking 'what about' questions in chronic illness self-management meetings (2016)

Fasulo, Alessandra ; Zinken, Jörg ; Zinken, Katarzyna

This study investigates ‘What about’ questions asked by patients in the course of diabetes self-management groups led by nurses, and explores their functions in these empowerment-informed settings.

Assistance and other forms of cooperative engagement (2016)

Zinken, Jörg ; Rossi, Giovanni

In their analysis of methods that participants use to manage the realization of practical courses of action, Kendrick and Drew (2016/this issue) focus on cases of assistance, where the need to be addressed is Self’s, and Other lends a helping hand. In our commentary, we point to other forms of cooperative engagement that are ubiquitously recruited in interaction. Imperative requests characteristically expect compliance on the grounds of Other’s already established commitment to a wider and shared course of actions. Established commitments can also provide the engine behind recruitment sequences that proceed nonverbally. And forms of cooperative engagement that are well glossed as assistance can nevertheless be demonstrably oriented to established commitments. In sum, we find commitment to shared courses of action to be an important element in the design and progression of certain recruitment sequences, where the involvement of Other is best defined as contribution. The commentary highlights the importance of interdependent orientations in the organization of cooperation. Data are in German, Italian, and Polish.

Aufbau einer Korpusinfrastruktur für die Beobachtung des Schreibgebrauchs (2016)

Fischer, Peter M. ; Diewald, Nils ; Kupietz, Marc ; Witt, Andreas

Aufgabenorientierung Jungen - Küchenschatz [Transkript 5.1] (2016)

Torres Cajo, Sarah

Backte oder buk, haute oder hieb? - Schwache oder starke Flexion (Aus: Grammatik in Fragen und Antworten) (2016)

Kubczak, Jacqueline

Be Careful Where You Smile: Culture Shapes Judgments of Intelligence and Honesty of Smiling Individuals (2016)

Smiling individuals are usually perceived more favorably than non-smiling ones—they are judged as happier, more attractive, competent, and friendly. These seemingly clear and obvious consequences of smiling are assumed to be culturally universal, however most of the psychological research is carried out in WEIRD societies (Western, Educated, Industrialized, Rich, and Democratic) and the influence of culture on social perception of nonverbal behavior is still understudied. Here we show that a smiling individual may be judged as less intelligent than the same non-smiling individual in cultures low on the GLOBE’s uncertainty avoidance dimension. Furthermore, we show that corruption at the societal level may undermine the prosocial perception of smiling—in societies with high corruption indicators, trust toward smiling individuals is reduced. This research fosters understanding of the cultural framework surrounding nonverbal communication processes and reveals that in some cultures smiling may lead to negative attributions.

Bericht über die 19. Arbeitstagung zur Gesprächsforschung am Institut für Deutsche Sprache (Mannheim) vom 16.-18. März 2016, Rahmenthema: Diskursmarker (2016)

Koblischke, Kristina

Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim (2016)

Koblischke, Kristina

Bericht über die 19. Arbeitstagung zur Gesprächsforschung vom 16. bis 18. März 2016 in Mannheim (2016)

Koblischke, Kristina

Bild-Makros als Motor der Facebook-Interaktion – Eine formale und interaktionale Betrachtung multimodaler Kommunikate (2016)

Arens, Katja

Bild-Makros, auch unter dem Begriff Memes bekannt, sind populäre Internetphänomene, die im Zuge der umfassenden Multimodalisierung der Medienkommunikation als Unterhaltungsangebote auf Facebook verbreitet und kommentiert werden. Dieser Beitrag betrachtet diese aus einer Kombination von Bild und Text bestehenden multimodalen Kommunikate aus einer gattungs- und gesprächsanalytischen Perspektive, da Bild- Makros sowohl in ihrer formalen und semantischen Gestaltung als auch in der interaktiven Rezeption in Form von Kommentaren und Antworten verfestigte Muster aufzuweisen scheinen. In dieser medial vermittelten Interaktion haben sich sowohl auf der strukturellen Ebene der Interaktionssequenzen als auch innerhalb einzelner, auf sequenzexterner und sequenzinterner Ebene analysierten Interaktionseinheiten verschiedene kommunikative Muster herausgebildet. Darin nehmen soziale Prozesse wie face-work und Identitätskonstruktion Einfluss auf die interaktive Aushandlung des Kommunikats.

Braucht man Sprachgeschichte?: Vortrag anlässlich der Verleihung des Konrad-Duden-Preises der Stadt Mannheim am 11. März 2015. Laudatio von Peter Schlobinski (2016)

Nübling, Damaris ; Schlobinski, Peter

Brown clustering for unlexicalized parsing (2016)

Dakota, Daniel

Brown clustering has been used to help increase parsing performance for morphologically rich languages. However, much of the work has focused on using clustering techniques to replace terminal nodes or as a feature for parsing. Instead, we choose to examine how effectively Brown clustering is for unlexicalized parsing by creating data-driven POS tagsets which are then used with the Berkeley parser. We investigate cluster sizes as well as on what information (e.g. words vs. lemmas) clustering will yield the best parser performance. Our results approach the current state of the art results for the German T¨uBa-D/Z treebank when using parser internal tagging.

Ciliegia, Noemi: Abkürzungen und Kurzwörter in der DDR. Eine sprachliche Wiedererinnerung [Rezension] (2016)

Hellmann, Manfred W.

CLARIN: Forschungsinfrastruktur für die Geistes- und Sozialwissenschaften (2016)

Trippel, Thorsten

Pflegte Ernst Jünger eine nationalistische Sprache? Derartige Fragen, die auf der Untersuchung großer Datenmengen basieren, können heute mit entsprechenden Forschungsinfrastrukturen geklärt werden.

Comparaison de deux marqueurs d’affirmation dans des séquences de co-construction: voilà et genau (2016)

Oloff, Florence

This contribution investigates the German response particle genau and the French response particle voilà within collaborative turn sequences in videotaped ordinary conversations. Adopting a conversation analytic approach to cross-linguistic comparison, I will show that the basic epistemic value of both particles allows them to be used in similar sequential environments. When a co-participant formulates a candidate conclusion in environments where it can be easily inferred from previous talk, first speakers may confirm the adequacy of the pre-emptive completion by voilà or genau. These particles may then also be followed by self- or other-repeats. The analyses aim to illustrate that participants rely on a variety of practices in order to positively assess a pre-emptive completion, and to refute a supposed binary opposition of refusal vs. acceptance in the receipt slot.

Compilation and Annotation of the Discourse-structured Blog Corpus for German (2016)

Grumt Suárez, Holger ; Karlova-Bourbonus, Natali ; Lobin, Henning

The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g., title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).

Constructing a Corpus (2016)

Kupietz, Marc

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project (2016)

Schmidt, Thomas

This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank für Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.

Converting and Representing Social Media Corpora into TEI: Schema and best practices from CLARIN-D (2016)

Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

The paper presents results from a curation project within CLARIN-D, in which an existing lMWord corpus of German chat communication has been integrated into the DEREKO and DWDS corpus infrastructures of the CLARIN-D centres at the Institute for the German Language (IDS, Mannheim) and at the Berlin-Brandenburg Academy of Sciences (BBAW, Berlin). The focus is on the solutions developed for converting and representing the corpus in a TEI format.

Corpus-driven description of multi-word patterns (2016)

Steyer, Kathrin

This paper presents our model of ‘MultiWord Patterns’ (MWPs). MWPs are defined as recurrent frozen schemes with fixed lexical components and productive slots that have a holistic – but not necessarily idiomatic – meaning and/or function, sometimes only on an abstract level. These patterns can only be reconstructed with corpus-driven, iterative (qualitative-quantitative) methods. This methodology includes complex phrase searches, collocation analysis that not only detects significant word pairs, but also significant syntagmatic cotext patterns and slot analysis with our UWV Tool. This tool allows us to bundle KWICs in order to detect the nature of lexical fillers for and to visualize MWP hierarchies.

Creating an extensible, levelled study corpus of Russian (2016)

Batinić, Dolores ; Birzer, Sandra ; Zinsmeister, Heike

In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.

Crosswalking from CMDI to Dublin Core and MARC 21 (2016)

Zinn, Claus ; Trippel, Thorsten ; Kaminski, Steve ; Dima, Emanuel

The Component MetaData Infrastructure (CMDI) is a framework for the creation and usage of metadata formats to describe all kinds of resources in the CLARIN world. To better connect to the library world, and to allow librarians to enter metadata for linguistic resources into their catalogues, a crosswalk from CMDI-based formats to bibliographic standards is required. The general and rather fluid nature of CMDI, however, makes it hard to map arbitrary CMDI schemas to metadata standards such as Dublin Core (DC) or MARC 21, which have a mature, well-defined and fixed set of field descriptors. In this paper, we address the issue and propose crosswalks between CMDI-based profiles originating from the NaLiDa project and DC and MARC 21, respectively.

Das Deutsch-russische Neologismenwörterbuch ist da. Zu den Spezifika des Wortartikelaufbaus (2016)

Nikitina, Olga ; Steffens, Doris

Zweisprachige Neologismenwörterbücher, die den neuen Wortschatz der Ausgangssprache eines bestimmten Zeitraumes erfassen und Bedeutungserklärungen und/oder Äquivalente in der Zielsprache anbieten, können dem Deutschlerner beim Sprachenerwerb eine große Hilfe sein. Sie präsentieren den Wortschatz, der in zweisprachigen Gesamtwörterbüchern in der Regel noch nicht erfasst ist, und unterstützen damit den Lerner bei der Textrezeption. Auch für die Textproduktion sind sie geeignet, wenn der Darstellung von Bedeutung und Gebrauch angemessen Raum gegeben wird. Diese Möglichkeiten werden am Beispiel des Deutsch-russischen Neologismenwörterbuches erläutert. Das Wörterbuch umfasst den Zeitraum 1991 – 2010. Es ist mit seinen knapp 2000 Stichwörtern für den neuen Wortschatz im Deutschen primär als passives Wörterbuch angelegt, d.h. es richtet sich in erster Linie an deutschlernende bzw. -beherrschende russischsprachige Benutzer. Es bietet zwei Vorteile: Zum einen finden die Benutzer hier den neuen Wortschatz, den sie in allgemeinen zweisprachigen Wörterbüchern in der Regel vergeblich suchen. Zum anderen ist dem allgemein großen Informationsbedarf durch eine explizite Beschreibung Rechnung getragen, weil das Platzangebot hier aufgrund der – im Vergleich zu einem allgemeinsprachlichen Gesamtwörterbuch – geringeren Stichwortzahl relativ großzügig bemessen ist. Die Spezifika des Wortartikelaufbaus, die auch durch den besonderen Charakter des zweisprachigen Neologismenwörterbuches bestimmt sind, werden näher erläutert. Die Autoren haben die Erwartung, dass das zweisprachige Neologismenwörterbuch bei den Deutschlernern den Wunsch weckt, Neues im deutschen Wortschatz nachzuschlagen, und dass es dazu beiträgt, die interkulturelle Kompetenz zu fördern.

Das Dortmunder Chat-Korpus in CLARIN-D: Modellierung und Mehrwerte (2016)

Beißwenger, Michael ; Herold, Axel ; Lüngen, Harald ; Storrer, Angelika

Das Institut für Deutsche Sprache im Jahr 2015 : Jahresbericht (2016)

Datenbank für Gesprochenes Deutsch (DGD) (2016)

Schmidt, Thomas

Datenmodellierung (2016)

Herold, Axel ; Meyer, Peter ; Müller-Spitzer, Carolin

Dependens (2016)

Lobin, Henning