Refine
Year of publication
- 2015 (148) (remove)
Document Type
- Part of a Book (55)
- Article (36)
- Conference Proceeding (31)
- Book (13)
- Part of Periodical (10)
- Working Paper (2)
- Review (1)
Is part of the Bibliography
- no (148) (remove)
Keywords
- Deutsch (52)
- Korpus <Linguistik> (24)
- Verb (10)
- Annotation (8)
- Englisch (8)
- Spanisch (7)
- Lernerwörterbuch (6)
- Mehrsprachigkeit (6)
- Computerlinguistik (5)
- Computerunterstützte Lexikographie (5)
Publicationstate
- Veröffentlichungsversion (82)
- Zweitveröffentlichung (17)
- Postprint (8)
- Erstveröffentlichung (1)
Reviewstate
- (Verlags)-Lektorat (62)
- Peer-Review (28)
- Peer-review (7)
- Verlags-Lektorat (4)
- Zweitveröffentlichung (2)
- Peer-Revied (1)
- Peer-reviewed (1)
- Review-Status-unbekannt (1)
Publisher
- Institut für Deutsche Sprache (23)
- de Gruyter (16)
- Narr (10)
- Lang (5)
- Springer (5)
- IDS (4)
- Narr Francke Attempto (3)
- Winter (3)
- Association for Computational Linguistics (2)
- De Gruyter (2)
This paper aims at showing how quantitative corpus linguistic analysis can inform qualitative analysis of digital media discourse with respect to the mediality of language in use. Using the example of protest discourse in Twitter, in the field of anti-Islamic ‘Pegida’ demonstrations, a three-step method of collecting, reducing and interpreting salient data is proposed. Each step is aligned with operative medial features of the microblog: hashtags, retweets and @-interactions. The exemplary analysis reveals the importance of discussions of attendance numbers in protest discourse and the asymmetry between administrative (i.e. the police) and non-administrative discourse agents. Furthermore, it exemplifies how frequency analysis and sequence analysis can be combined for research in media linguistics.
We investigate whether non-configurational languages, which display more word order variation than configurational ones, require more training data for a phenomenon to be parsed successfully. We perform a tightly controlled study comparing the dative alternation for English (a configurational language), German, and Russian (both non-configurational). More specifically, we compare the performance of a dependency parser when only canonical word order is present with its performance on data sets when all word orders are present. Our results show that for all languages, canonical data not only is easier to parse, but there exists no direct correspondence between the size of training sets containing free(er) word order variation and performance.
Some 25 years ago, a large-scale repatriation of Russian Germans began. As a result, more than 2,5 million people that grew up in the USSR, Russia, or other post-Soviet states, became German citizens who had native or near-native command of the Russian language. The uncomfortable differences they exhibited in comparison to those who were supposed to accept them as equals, yet failed to do so, compelled them to search for self-designations that would accommodate their new identity and to bond together to form a new minority. The authors examine the attempts of Soviet/Russian Germans to redefine their ethnic identity in terms of not just blood but also language and culture, focusing on two particular cases: the use of the name Rusak in the internet forums of the repatriated immigrants; and the linguistic-cultural practices of the older generation of immigrants.
Prosodic constructions used to compete for the speaking turn in conversation have been widely studied (French & Local (1983), Kurtić et al. (2013)). Usually, turn competition arises in overlapping talk between at least two speakers. Coordination between participants in their prosodic design of talk (Szczepek-Reed, 2006) and social action (Gorisch et al. 2012), as well as entrainment in more general terms (Levitan et al. 2011), is well established in the literature. Nevertheless, previous studies on turn competition and overlap do not investigate the prosodic design of turn competitive incomings in reference to the orientation of the speakers to each other. Rather, they assume that prosodic constructions are used for turn competition regardless of the co-participants’ design of the turn. In this paper, we ask whether the prosodic design of turn competitive talk is co-constructed between two participants talking in overlap. More specifically, we investigate whether the prosodic design of one participant’s in overlap talk is developed with respect to the interlocutor’s prosodic features during the same portion of overlapped talk, and whether this prosodic matching can discriminate between the overlaps that are competitive and those that are not. 183 Our analyses are based on two-speaker overlaps drawn from a corpus of multi-party face-to face conversation between four friends recorded in British English (Kurtic et al. 2012). 3407 instances of twospeaker overlaps have been extracted from 4 hours of talk. Two independent conversation analysts performed the interactional categorisation of overlaps into competitive and non-competitive for all these two-speaker overlap instances and achieved a good agreement of alpha=0.807 (Krippendorff 2004) as measured on a subset of 808 overlaps selected for our initial analysis. For the analysis of prosodic features we focus on F0 related features: mean, slope, span and contour, all of which have previously been shown to be used by each overlapping speaker separately for turn competition (Kurtic et al. 2009; Oertel et al. 2012). We investigate the similarity in F0 mean, slope and span by correlating these features across the two participants. For F0 contour, a similarity coefficient is computed using dynamic programming method described in Gorisch et al. (2012). We consider the difference in F0 contour similarity in competitive and non-competitive overlaps as an indication of intonational matching being a turn competitive resource. We conduct these analyses for overlaps that are clearly competitive or noncompetitive as indicated by inter-annotator agreement. In addition, we qualitatively explore those cases that annotators disagree on in order to investigate whether they reveal further important interactional or prosodic features of in-overlap talk. Our preliminary results suggest that conversational participants attend and adapt to the interlocutor during overlap depending on whether they return competition or not. We explain our findings in relation to previous work on turn competition in overlap, discuss the quantitative method employed and also address the possible consequences of our results for the study of prosodic realization of other social actions in conversation.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of all linguistic theories that take social interaction involving language into account. However, determining communicative functions is a notoriously difficult task both for human interpreters and systems. It involves an interpretative process that integrates various sources of information. Existing work on communicative function classification comes from either dialogue act tagging where it is generally coarse grained concerning the feed- back phenomena or it is token-based and does not address the variety of forms that feed- back utterances can take. This paper introduces an annotation framework, the dataset and the related annotation campaign (involving 7 raters to annotate nearly 6000 utterances). We present its evaluation not merely in terms of inter-rater agreement but also in terms of usability of the resulting reference dataset both from a linguistic research perspective and from a more applicative viewpoint.
Feedback utterances are among the most frequent in dialogue. Feedback is also a crucial aspect of linguistic theories that take social interaction, involving language, into account. This paper introduces the corpora and datasets of a project scrutinizing this kind of feedback utterances in French. We present the genesis of the corpora (for a total of about 16 hours of transcribed and phone force-aligned speech) involved in the project. We introduce the resulting datasets and discuss how they are being used in on-going work with focus on the form-function relationship of conversational feedback. All the corpora created and the datasets produced in the framework of this project will be made available for research purposes.
Precise multimodal studies require precise synchronisation between audio and video signals. However, raw audio and audio from video recordings can be out of sync for several reasons. In order to re-synchronise them, a dynamic programming (DP) approach is presented here. Traditionally, DP is performed on the rectangular distance matrix comparing each value in signal A with each value in signal B. Previous work limited the search space using for example the Sakoe Chiba Band (Sakoe and Chiba, 1978). However, the overall space of the distance matrix remains identical. Here, a tunnel matrix and its according DP-algorithm are presented. The matrix contains merely the computed distance of two signals to a pre-specified bandwidth and the computational cost is equally reduced. An example implementation demonstrates the functionality on artificial data and on data from real audio and video recordings.
Lesen und lesen lassen
(2015)
Centering on German self-motion verbs, this paper demonstrates the advantages of free-sorting over creating and delineating word fields with more traditional methods. In particular, I draw a comparison to Snell-Hornby’s (1983) work on German descriptive verbs, which produces lexical fields with the help of dictionary entries, a thesaurus, a small corpus of written text and limited speaker feedback. While these methods have benefits, they are limited in their ability to represent the average organization of semantic fields in the mind of everyday speakers. Freesorting, by contrast, does not rely on academic resources, corpora or singular speaker judgments. In sorting, a group of informants creates visible sets of items according to perceived similarity. Psycholinguists have used the method to quantitatively explore the perception of color terms across cultures (c.f. Roberson et al. 2005). With a sufficiently large number of informants, one can generate lexical sorting data that is apt for cluster analysis, the results of which are represented by dendrograms. The experiment I conducted involved 33 school children from a middle class neighborhood in Braunschweig, Northern Germany. My experiment shows that Snell-Hornby’s (1983) representation of the self-motion field can be improved by integrating further dimensions of meaning, such as body-space relations and sound, that young speakers find salient in the grouping procedure.
This paper presents newly developed guidelines for prosodic annotation of German as a consensus system agreed upon by German intonologists. The DIMA system is rooted in the framework of autosegmental-metrical phonology. One important goal of the consensus is to make exchanging data between groups easier since German intonation is currently annotated according to different models. To this end, we aim to provide guidelines that are easy to learn. The guidelines were evaluated running an inter-annotator reliability study on three different speech styles (read speech, monologue and dialogue). The overall high κ between 0.76 and 0.89 (depending on the speech style) shows that the DIMA conventions can be applied successfully.
In my article I argue the need for an existence of grammar in spoken language. It would have the same functions as the grammar of written language: describing and explaining the fundamental units of spoken language and their features, describing the composition of those units and their conjunction. The basic units in the grammar of spoken language can be named as: the sound, the word, the functional unit, the conversational turn and the conversation itself. Further the central characteristics of spoken language and their impact on grammar have to be taken into account. They are: the interactivity, the multimodality, the processabihty and the great variability. After displaying my concepts I discuss three alternative concepts of a grammar in spoken language: online-syntax, construction grammar and multimodal grammar. The article concludes by discussing the role of spoken language grammar in language and foreign language teaching.
Digressions
(2015)
Der Beitrag von Bruno Strecker Digressions ist auf Französisch geschrieben (der Muttersprache von Jacqueline Kubczak) und handelt von unterschiedlichen Exkursen. Er macht die Verbindung zwischen Kommunikationssituation und Arten der Exkurse sichtbar und bietet eine darauf basierende Typologie der Exkurse an. In einem zweiten Schritt werden die formalen Möglichkeiten, einen Exkurs einzuleiten und zu formulieren, dargestellt (z. B. durch Appositionen, Parenthesen, festgelegte Ausdrucksformen wie A propos xxx, Ça me rappelle oder nicht eingebettete Phrasen). Schließlich zeigt er, wie man aus dem Exkurs wieder „in die Spur“ kommt.
Ph@ttSessionz and Deutsch heute are two large German speech databases. They were created for different purposes: Ph@ttSessionz to test Internet-based recordings and to adapt speech recognizers to the voices of adolescent speakers, Deutsch heute to document regional variation of German. The databases differ in their recording technique, the selection of recording locations and speakers, elicitation mode, and data processing.
In this paper, we outline how the recordings were performed, how the data was processed and annotated, and how the two databases were imported into a single relational database system. We present acoustical measurements on the digit items of both databases. Our results confirm that the elicitation technique affects the speech produced, that f0 is quite comparable despite different recording procedures, and that large speech technology databases with suitable metadata may well be used for the analysis of regional variation of speech.
Based on specific linguistic landmarks in the speech signal, this study investigates pitch level and pitch span differences in English, German, Bulgarian and Polish. The analysis is based on 22 speakers per language (11 males and 11 females). Linear mixed models were computed that include various linguistic measures of pitch level and span, revealing characteristic differences across languages and between language groups. Pitch level appeared to have significantly higher values for the female speakers in the Slavic than the Germanic group. The male speakers showed slightly different results, with only the Polish speakers displaying significantly higher mean values for pitch level than the German males. Overall, the results show that the Slavic speakers tend to have a wider pitch span than the German speakers. But for the linguistic measure, namely for span between the initial peaks and the non-prominent valleys, we only find the difference between Polish and German speakers. We found a flatter intonation contour in German than in Polish, Bulgarian and English male and female speakers and differences in the frequency of the landmarks between languages. Concerning “speaker liveliness” we found that the speakers from the Slavic group are significantly livelier than the speakers from the Germanic group.
Moderne Grammatiktheorien sind statisch, d.h. skriptizistisch und synchronizistisch. Dies bedeutet, dass deren Beschreibungsapparat auf die Strukturen gegenwärtiger Schrift- und Standardsprachen zugeschnitten ist. Im Beitrag wird für einen dynamischen, d.h. nichtskriptizistischen und nichtsynchronizistischen, Perspektivenwechsel in der Grammatikforschung plädiert, der auf folgenden empirisch fundierten Überlegungen basiert:
1. Literalisierung ist eine kulturelle Universalie, die kognitiv verankert ist.
2. Es sind unterschiedliche Phasen der Literalisierung zu unterscheiden.
3. Literalisierung im Allgemeinen und die Phasen der Literalisierung im Besonderen haben Konsequenzen für die grammatische Struktur.
4. Die Interpretation von grammatischen Strukturen ist nur vor der Folie der jeweiligen Phase der Literalisierung möglich.
5. Ein dynamisches Grammatikmodell muss das historische Verhältnis auch begrifflich abbilden. Dies wird an zentralen grammatischen Konzepten wie Aggregation vs. Integration, Wortgruppe vs. Phase und an der Wortstellung (Verbklammer, Stellungsfeldermodell, Satzrandglieder) veranschaulicht.
6. Historisch ist von einem dynamischen Verhältnis von Online- und Offlinesyntax, von syntaktischer Zeitlichkeit und syntaktischer Räumlichkeit, auszugehen. Was zu einer bestimmten Zeit und in einer bestimmten Varietät als Onlinestruktur zu interpretieren ist, hängt von dem jeweiligen historischen Verhältnis von Online- und Offlinestrukturen ab.
Thema dieses Beitrags sind die komplexen Nominalphrasen im Deutschen, die von außen gesehen unter Umständen monströs anmuten. Ein besonderes, wohl bekanntes Problem bieten dabei sogenannte erweiterte vorangestellte Attribute. Die Komplexitäten geben u.A. zu folgenden Fragen Anlass: Inwiefern lässt sich die ‚Ausuferung‘ der deutschen Nominalphrase funktional begründen? Falls es ein Rationales hinter den Komplexitäten gibt, wie lösen dann Sprachen, die entsprechende Ausbaumöglichkeiten nicht besitzen, die einschlägigen funktionalen Aufgaben? Hier soll primär die erste Frage diskutiert werden anhand von authentischen Text(ausschnitt)en, die das Zusammenspiel zwischen vorangestellten und nachgestellten ‚Erweiterungen‘ der Nominalphrase – Relativsätze eingeschlossen – wie auch die Funktion sogenannter nichtrestriktiver Attribute im Diskurs veranschaulichen können; die zweite Frage wird in relevanten Zusammenhängen mit berücksichtigt.
Der Tanz um das Verb
(2015)
Satz - oberflächlich
(2015)
Das hier vorgestellte oberflächennahe Satzkonzept orientiert sich an der Definition der IDS-Grammatik: Sätze sind Konstruktionsformen, die mindestens aus einem finiten Verb und seinen Komplementen bestehen. Das semantische Korrelat des Satzes ist die Proposition, bestehend aus Prädikat und Argumenten. Die Unterscheidung der englischsprachigen Tradition zwischen sentence und clause bzw. die entsprechende Unterscheidung zwischen proposition und phrase im Französischen wird in diesem Ansatz durch die Opposition zwischen ,Vollsatz‘ und ,Teilsatz‘ erfasst. Oberflächenorientierte Satzdefinitionen können, im Gegensatz zu der hier vertretenen intern-syntaktischen Definition, auch – in syntaktischer Hinsicht – auf externen Merkmalen beruhen, nämlich auf orthografisch-prosodischen Merkmalen oder dem Kriterium der syntaktischen Unabhängigkeit gemäß Bloomfields bekannter Satzdefinition. In typologischer Perspektive zeichnen sich Sätze durch einen „satzkonstituierenden Akt“ (Sasse 1991, 77) aus bzw. eine spezifische morphosyntaktische Konstellation, die zum Ausdruck des Sachverhalts hinzukommen muss. Unter pragmatischer Perspektive ist der Satz die prototypische Mitteilungseinheit. Er kann dekontextualisiert werden, während andere Mitteilungsformen nur in ihrem jeweiligen Kontext interpretierbar sind. Ihrem semiotischen Status nach sind Sätze komplexe sprachliche Zeichen. Die ihnen zugrundeliegenden Regeln oder Konstruktionen hingegen haben keinen Zeichencharakter.
The effect of manipulation of a speaker’s voice as well as exposure to a native speaker’s utterance was investigated regarding the pronunciation of stops by German learners of French. Three subject groups, a Control (CG), a Manipulation (MG), and a Native Speaker (NG) Group, were recorded on two subsequent days. The MG was presented with a manipulation of their voice on the second day and the NG listened to a native French speaker, while the CG did not receive any feedback. Results show that speakers of the MG and NG were able to extract useful information from the respective feedback and successfully adapted to it. Participants were able to reduce their voice onset time values, although speakers of the NG reduced it to a greater extent.
Sprache ist nie homogen, sie weist Varianz auf. Es gibt viele Gründe für diese Vielfalt, und die meisten sind schon sehr gut beschrieben worden (und sollen daher im vorliegenden Beitrag nicht im Vordergrund stehen). Gegenspieler der Varianz sind die mehr oder weniger expliziten Normen - sie sollen dafür sorgen, dass die Varianz ein gewisses Maß nicht überschreitet. Wobei sich natürlich sofort die Frage stellt, wie (und von wem) das „Maß“ definiert wird. Bei der Beurteilung dieser Fragen spielen nicht nur soziolinguistische, sondern auch strukturelle Aspekte eine Rolle, und Letzterem wird der vorliegende Beitrag nachgehen, und zwar anhand von Beispielen aus der Morphophonologie, der Morphosyntax und der Orthografie.
The article aims to show how it is possible to use the idea of constructions in Construction Grammar for the purpose of capturing discourse phenomena within communication in sciences. First, 1 present an analysis of three grammatical examples in order to account for them as constructions. This attempt is based on their specific features relating to the role they play in scientific articles. It is then argued that the pragmatic properties described in connection with specific grammatical phenomena can be embedded in a general framework to account for text units as discourse-level constructions.
This chapter analyses the impact of political decentralization in a state on the position of ethnic and linguistic minorities, in particular with regard to the role of parliamentary assemblies in the political system. It relates a number of typical functions of parliaments to the specific needs of minorities and their languages. The most important of these functions are the representation of the minority and responsiveness to the minority’s needs. The chapter then discusses six examples from the European Union (and Norway) which prototypically represent different types of parliamentary decentralization: the ethnically defined Sameting in Norway and its importance for the Sámi population, the Scottish Parliament and its role for speakers of Scottish Gaelic, the German regional parliaments of the Länder of Schleswig-Holstein and Saxony and their impact on the Frisian and Sorbian minorities respectively, the autonomy of predominantly German-speaking South Tyrol within the Italian state, and finally the situation of the speakers of Latgalian in Latvia, where a decentralized parliament is missing. The chapter also makes suggestions on comparisons of these situations with minorities in Russia. It finally argues that political decentralization may indeed empower minorities to gain a greater voice in their states, even if much ultimately depends on individual factors in each situation and the attitudes by the majority population and the political center.
Preface
(2015)
Russia, its languages and its ethnic groups are for many readers of English surprisingly unknown territory. Even among academics and researchers familiar with many ethnolinguistic situations around the globe, there prevails rather unsystematic and fragmented knowledge about Russia. This relates to both the micro level such as the individual situations of specific ethnic or linguistic groups, and to the macro level with regard to the entire interplay of linguistic practices, ideologies, laws, and other policies in Russia. In total, this lack of information about Russia stands in sharp contrast to the abundance of literature on ethnolinguistic situations, minority languages, language revitalization, and ideologies toward languages and multilingualism which has been published throughout the past decades.
This is the first comprehensive volume to compare the sociolinguistic situations of minorities in Russia and in Western Europe. As such, it provides insight into language policies, the ethnolinguistic vitality and the struggle for reversal of language shift, language revitalization and empowerment of minorities in Russia and the European Union. The volume shows that, even though largely unknown to a broader English-reading audience, the linguistic composition of Russia is by no means less diverse than multilingualism in the EU. It is therefore a valuable introduction into the historical backgrounds and current linguistic, social and legal affairs with regard to Russia’s manifold ethnic and linguistic minorities, mirrored on the discussion of recent issues in a number of well-known Western European minority situations.
Im vorliegenden Beitrag soll gezeigt werden, wie Konnektoren als sprachliche Mittel zur Aktualisierung von zwei Arten konversationeller Aktivitäten eingesetzt werden können, nämlich von intersubjektiven bzw. gesprächsorganisatorischen Verfahren. Auf intersubjektive Verfahren greift ein Sprecher zurück, um in Kooperation mit seinem Gesprächspartner einen gemeinsamen Wissenshintergrund (common ground) zu schaffen. Durch gesprächsorganisatorische Verfahren greift der Sprecher in die gesprächsthematische Struktur des Interaktionsgeschehens ein. In diesem Beitrag wird die Aktualisierung dieser beiden konversationellen Verfahren am Beispiel der kommunikativen Gattung autobiographisches Interview betrachtet. Diese Gattung ist für eine solche Analyse m. E. besonders geeignet, denn sie zeichnet sich durch eine relativ scharfe Trennung der Gesprächsrollen aus, die das Nachvollziehen des Interaktionsgeschehens erleichtert. An einem autobiographischen Interview sind zwei Subjekte beteiligt: der Interviewte, der als Wissensträger gilt, und der Interviewer, der durch seine Rolle als Gesprächsleiter die Wissensvermittlung begünstigen soll. Der Interviewer ist also mit einer zweifachen Aufgabe konfrontiert, denn er muss die anfängliche Wissensasymmetrie ausgleichen und ist zugleich für die Gesprächsorganisation zuständig. Im Folgenden soll am Beispiel des Konjunktors und veranschaulicht werden, wie der Gebrauch von Konnektoren zur Bewältigung dieser beiden kommunikativen Aufgaben beitragen kann.
The present study introduces articulography, the measurement of the position of tongue and lips during speech, as a promising method to the study of dialect variation. By using generalized additive modeling to analyze articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation across dozens of speakers. Our results on the basis of Dutch dialect data show clear differences between the southern and the northern dialect with respect to tongue position, with a more frontal tongue position in the dialect from Ubbergen (in the southern half of the Netherlands) than in the dialect of Ter Apel (in the northern half of the Netherlands). Thus articulography appears to be a suitable tool to investigate structural differences in pronunciation at the dialect level.
KoralQuery 0.3
(2015)
KoralQuery is a general corpus query protocol (i.e. independent of research tasks and corpus formats), serialized in JSON-LD [1]. KoralQuery focuses on simplicity of implementation rather than human readibility and writability. Support for a growing number of query languages is granted by the Koral serialization processor.
Mit dem cGAT-Handbuch stellt das FOLK-Projekt eine Richtlinie für das computergestützte Transkribieren nach GAT 2 zur Verfügung. Das Handbuch wurde anhand der Transkriptionspraxis in FOLK entwickelt und enthält eine Vielzahl von authentischen Beispielen, die mit dem zugehörigen Audio auch über die Datenbank für Gesprochenes Deutsch (DGD) abgerufen werden können.
Der vorliegende Aufsatz befasst sich mit der Verbreitung des Lexems Nerd in der deutschen Sprache. Untersucht wurde die DeReKo-Datenbank hinsichtlich der Frequenz des Wortes und der ko-textuellen Umgebungen. Diese Daten wurden verglichen mit einem Korpus aus möglichen Übersetzungen des Lexems, das sich aus US-amerikanischen Serien zusammensetzt (,Scrubs‘, ,The Big Bang Theory‘, ,Family Guy‘ und ,American Dad‘). Aus der Synopse der gewonnenen Erkenntnisse und der sprachhistorischen Analyse des Lexems kann abgeleitet werden, dass Synchronfassungen den zeitgenössischen Sprachgebrauch widerspiegeln und daher auch steter Quell für Sprachwandel sind. Bezogen auf das Lexem Nerd ist der Schluss zu ziehen, dass dieses den Status eines assimilierten Fremdwortes erreicht hat und lediglich die Adjektivierung noch nicht vollständig integriert ist. Eine Übersetzung mit deutschen Lexemen erscheint in diesem Zusammenhang nicht sinnvoll.
The web portal Lehnwortportal Deutsch <lwp.ids-mannheim.de>, developed at the Institute for the German Language (IDS), aims to provide unified access to a growing number of lexicographical resources on German loanwords in other languages. This paper discusses different possibilities of creating an onomasiological access structure for portal users. We critically examine the meaning list of the “World Loanword Database” project (Haspelmath/Tadmor 2009a) as well as WordNet-based taxonomies and propose a new way of inductively creating a semantic classification scheme that takes both hyperonymic relations and semantic fields into account. We show how such a classification can be integrated into the underlying graph-based data representation of the Lehnwortportal and thus be exploited for advanced onomasiological search options.
Neologismen
(2015)
Phrasenkomposita im Deutschen. Empirische Untersuchung und konstruktionsgrammatische Modellierung
(2015)
Phrasenkomposita wie Heile-Welt-Gerede oder "Ich-kann-Golf-Ski-und-Wandern-und-bin-schöner-als-die-andern"-Franz werden im Deutschen mit steigender Tendenz verwendet. Sie sind eine Herausforderung für die linguistische Beschreibung.
Der vorliegende Band präsentiert die erste umfassende Untersuchung von Phrasenkomposita. Seine besondere Leistung besteht darin, dass er sowohl eine grammatiktheoretische Modellierung als auch eine breit angelegte korpuslinguistische Untersuchung des Phänomens bietet. Den theoretischen Rahmen bildet ein konstruktionsgrammatischer Ansatz mit gebrauchsbasierter Ausrichtung. Basis für die induktive Datenerhebung ist das ‘Deutsche Referenzkorpus’ des Instituts für Deutsche Sprache, Mannheim. Die Ergebnisse zeigen zum einen, wie sich der konstruktionsgrammatische Ansatz gewinnbringend zur Beschreibung von Wortbildungsphänomenen einsetzen lässt. Zum anderen werden innovative Methoden (Analysemodell, Suchanfrage-Strategie zur induktiven Korpusabfrage) entwickelt, die für die Anwendung der Konstruktionsgrammatik auf authentische Sprachdaten benötigt werden.
"Hey, was geht?". Beobachtungen zum Wandel und zur Differenzierung von Begrüßungsformen Jugendlicher
(2015)
Der Beitrag stellt dar, in welch hohem Maße sprachliche Spielfreude zur Ausbildung großer Kompositagruppen rund um ein anregendes Musterwort führen kann und wie dieser Spieltrieb auch eine Fülle von Varianten zu interessanten literarischen oder politischen Formulierungen hervorbringt. Um solche Formulierspiele quer durch aktuelle Zeitungen in Deutschland, in Österreich und in der Schweiz verfolgen zu können, empfiehlt sich die Benutzung eines großen digitalen Erfassungssytems wie COSMAS II aus dem Institut für Deutsche Sprache in Mannheim.
The IMS Open Corpus Workbench (CWB) software currently uses a simple tabular data model with proven limitations. We outline and justify the need for a new data model to underlie the next major version of CWB. This data model, dubbed Ziggurat, defines a series of types of data layer to represent different structures and relations within an annotated corpus; each such layer may contain variables of different types. Ziggurat will allow us to gradually extend and enhance CWB’s existing CQP-syntax for corpus queries, and also make possible more radical departures relative not only to the current version of CWB but also to other contemporary corpus-analysis software.
With an increasing amount of text data available it is possible to automatically extract a variety of information about language. One way to obtain knowledge about subtle relations and analogies between words is to observe words which are used in the same context. Recently, Mikolov et al. proposed a method to efficiently compute Euclidean word representations which seem to capture subtle relations and analogies between words in the English language. We demonstrate that this method also captures analogies in the German language. Furthermore, we show that we can transfer information extracted from large non-annotated corpora into small annotated corpora, which are then, in turn, used for training NLP systems.
This article reports on the on-going CoRoLa project, aiming at creating a reference corpus of contemporary Romanian (from 1945 onwards), opened for online free exploitation by researchers in linguistics and language processing, teachers of Romanian, students. We invest serious efforts in persuading large publishing houses and other owners of IPR on relevant language data to join us and contribute the project with selections of their text and speech repositories. The CoRoLa project is coordinated by two Computer Science institutes of the Romanian Academy, but enjoys cooperation of and consulting from professional linguists from other institutes of the Romanian Academy. We foresee a written component of the corpus of more than 500 million word forms, and a speech component of about 300 hours of recordings. The entire collection of texts (covering all functional styles of the language) will be pre-processed and annotated at several levels, and also documented with standardized metadata. The pre-processing includes cleaning the data and harmonising the diacritics, sentence splitting and tokenization. Annotation will include morpho-lexical tagging and lemmatization in the first stage, followed by syntactic, semantic and discourse annotation in a later stage.
In this paper, I present the COW14 tool chain, which comprises a web corpus creation tool called texrex, wrappers for existing linguistic annotation tools as well as an online query software called Colibri2. By detailed descriptions of the implementation and systematic evaluations of the performance of the software on different types of systems, I show that the COW14 architecture is capable of handling the creation of corpora of up to at least 100 billion tokens. I also introduce our running demo system which currently serves corpora of up to roughly 20 billion tokens in Dutch, English, French, German, Spanish, and Swedish
In a project called "A Library of a Billion Words" we needed an implementation of the CTS protocol that is capable of handling a text collection containing at least 1 billion words. Because the existing solutions did not work for this scale or were still in development I started an implementation of the CTS protocol using methods that MySQL provides. Last year we published a paper that introduced a prototype with the core functionalities without being compliant with the specifications of CTS (Tiepmar et al., 2013). The purpose of this paper is to describe and evaluate the MySQL based implementation now that it is fulfilling the specifications version 5.0 rc.1 and mark it as finished and ready to use. Further information, online instances of CTS for all described datasets and binaries can be accessed via the projects website.
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as partof- speech tagging, lemmatisation, chunking, and dependency parsing facilitate precise querying of linguistic properties and can be used to extend word alignment to sub-sentential groups. Such highly interconnected data is stored in a relational database to allow for efficient retrieval and linguistic data mining, which may include the statistics-based selection of good example sentences. The varying information needs of contrastive linguists require a flexible linguistic query language for ad hoc searches. Such queries in the format of generalised treebank query languages will be automatically translated into SQL queries.
The Czech National Corpus (CNC) is a longterm project striving for extensive and continuous mapping of the Czech language. This effort results mostly in compilation, maintenance and providing free public access to a range of various corpora with the aim to offer a diverse, representative, and high-quality data for empirical research mainly in linguistics. Since 2012, the CNC is officially recognized as a research infrastructure funded by the Czech Ministry of Education, Youth and Sports which has caused a recent shift towards user service-oriented operation of the project. All project-related resources are now integrated into the CNC research portal at http://www.korpus.cz/. Currently, the CNC has an established and growing user community of more than 4,500 active users in the Czech Republic and abroad who put almost 1,900 queries per day using one of the user interfaces. The paper discusses the main CNC objectives for each particular domain, aiming at an overview of the current situation supplemented by an outline of future plans.
Synonymie und Antonymie
(2015)
Synonymie (zum Beispiel ‚essen‘ und ‚speisen‘) und Antonymie (zum Beispiel ‚heiß‘ und ‚kalt‘), also Ähnlichkeit und Gegensätzlichkeit der Bedeutung, sind Phänomene, die im Zentrum sprachwissenschaftlicher Forschung stehen. Der zweite Band der Reihe ‚Literaturhinweise zur Linguistik‘ bietet eine konzise Einführung in das Thema Synonymie und Antonymie und eine strukturierte Auswahlbibliografie mit aktueller Fachliteratur und bewährten Nachschlagewerken. Er berücksichtigt verschiedene Richtungen der modernen Sprachwissenschaft wie etwa die Kognitionswissenschaft, die Korpus- und Computerlinguistik und Deutsch als Fremdsprache.
Zu den Gemeinplätzen wissenschaftlichen wie populären Nachdenkens über den Menschen gehört, dass es die Sprache ist, die ihn gegenüber allen anderen Lebewesen auszeichnet. Die naheliegende Folgerung, dass Sprachwissenschaft deshalb immer auch eine anthropologische Wissenschaft ist, wird dennoch eher selten gezogen. Dies obwohl es praktisch nicht möglich ist, sprachtheoretische Überlegungen zum ‚Wesen‘ der Sprache oder zu zentralen Fragestellungen der Linguistik anzustellen, ohne zumindest implizit auch ein Bild des Menschen selbst zu entwerfen. Der folgende Beitrag geht von Humboldt über Benveniste bis zur neueren conversation analysis denjenigen sprachtheoretischen Traditionslinien nach, welche den sprachlichen Menschen als einen basal auf ein Gegenüber bezogenen Menschen entwerfen – eine Konstellation, die zudem die Figur des ‚Dritten‘ erzeugt – und welche Sprachlichkeit als prägendes Formativ menschlicher Sozialität verstehen. Sprache wird entsprechend nicht nur als Medium referenzieller ‚Aboutness‘, sondern ebenso performativer ‚Withness‘ betrachtet. Im Horizont der Überlegungen steht dann allerdings auch die Frage, in welcher Weise die an gesprochener Sprache ausgerichtete, interaktionsorientierte Neukonturierung der Sprachwissenschaft in der zweiten Hälfte des 20. Jahrhunderts einen neuen Blick auf Schriftsprachlichkeit und ihre Leistungen in der Selbstformierung des Menschen ermöglicht.
Der Beitrag zum 50-jährigen Bestehen des IDS gibt einen Überblick über die Entstehung und Entwicklung der Satzsemantik, der am Wahrheitswert von Aussagen orientierten Lehre von zusammengesetzten sprachlichen Ausdrücken. Er tut dies am Beispiel der Negation, insbesondere an der syntaktischen Realisierung der Negation mit dem Negationsartikel ‚kein‘, an Negativen Polaritätselementen wie ‚jemals‘, an der doppelten Negation wie in ‚nicht unglücklich‘ und an der pleonastischen Negation nach ‚bevor‘. Auch die Negation in Fragen und Antwortpartikeln wie ‚nein‘ kommen zur Sprache.
Der Beitrag versteht sich als erster Schritt zur historiographischen Rekonstruktion der Soziolinguistik in der Bundesrepublik Deutschland. Es wird gezeigt, wie in gewolltem Bruch mit der älteren germanistischen Forschung zum Thema Sprache und Gesellschaft in den späten 1960er Jahren die neue Disziplin der Soziolinguistik in Auseinandersetzung mit den Theorien Bernsteins entstand, sich die Soziolinguistik anschließend professionalisierte und das Spektrum ihrer Themen verbreiterte, schließlich auch den Anschluss an ältere Theorien insbesondere in der Dialektologie wiederfand.
Interaktionslinguistik
(2015)
Interaktion wird im vorliegenden Beitrag als eine Realisierung von Kommunikation verstanden, deren Konstitutionskriterium nicht Sprachlichkeit, sondern Anwesenheit ist. Anwesenheit ist dabei keine äußerliche Bedingung von Interaktion, sondern wird – im Medium der Wahrnehmungswahrnehmung – erst durch diese hergestellt. Entscheidend für die Rolle der Sprache bei der Konstitution von Interaktion sind die Minima des Sprechens und Zuhörens, die unter den Stichworten Materialität, Sequenzialität und Medialität vorgestellt werden. Anhand dieser Minima lassen sich die Qualitäten der Sprache als Ressource für die Bearbeitung interaktionskonstitutiver Probleme (wie Turn-Taking, Themenorganisation oder Situierung) fassen. Dass es neben der Sprache für die Hervorbringung von Interaktion weiterer, bisher weniger gut untersuchter Ressourcen bedarf, wird am Ende des Beitrags am Beispiel des Beitrags von Architektur zur Lösung des Situierungsproblems erörtert.
In den letzten Jahrzehnten hat sich die Perspektive auf den Gegenstand der Sprachwissenschaft immer wieder verändert, vor allem gegen überkommene Reduktionismen erweitert. Hier soll auf Entwicklungen im Zusammenhang mit der Herausbildung einer „Medienlinguistik“ eingegangen werden, die (allgemeiner) die Medialität von Sprache behandelt, auch im Zusammenspiel mit anderen Zeichenarten, (spezieller) die Rolle von Sprache in (technischen) Medien. Von den sehr zahlreichen Varianten der Kombination von Modalitäten und Kodalitäten sollen hier zwei sehr unterschiedliche Schwerpunkte betrachtet werden: Visualität von Sprache, in und von Texten, und sekundäre Audiovisualität.
Der Beitrag beleuchtet unterschiedliche Raumkonzeptionen, welche die Dialektologie als „Raumlinguistik“ im letzten halben Jahrhundert geprägt haben. So spielt Raum als physisch-materieller Erdraum in der Dialektologie nach wie vor eine zentrale Rolle und wird als Bedingungsrahmen für die diatopische Sprachvarianz verstanden. Räume gänzlich anderer Natur sind Räume, die aus dialektgeografischen Abstraktionsprozessen resultieren und sich aus Verteilungen sprachlicher Größen im physisch-materiellen Raum ergeben. Zur außersprachlichen Erklärung diatopischer Variation werden solche sprachräumlichen Verteilungen mit erdräumlichen Gegebenheiten, mit politischen Territorien oder kulturräumlichen Verteilungen abgeglichen. Wegen der Beliebigkeit der für den Abgleich ausgewählten dialektalen Variablen ist dieses Vorgehen lange Zeit etwas in Verruf geraten, wird heute jedoch mit dialektometrischen Verfahren dem willkürlichen Zugriff entzogen und neu lanciert.
Raum als immaterielle Ordnungsstruktur wird – nicht nur in der Linguistik – als probates Instrument genutzt, um Gedachtes metaphorisch zu ordnen. Insbesondere die Sozio- oder kommunikative Dialektologie, die seit ein paar Jahrzehnten die monodimensionale Grundmundarten-Dialektologie aufbricht, hat mit Konzepten wie „Variantenraum“ oder „sozialer Raum“ ihren Gegenstandsbereich faß- und vermessbar gemacht.
Seit einiger Zeit erfährt der „erlebte Raum“ im Rahmen der sogenannten Wahrnehmungsdialektologie lebhaften Zuspruch. Diese dialektologische Ausrichtung erkundet die sprachraumbezogenen Alltagskonzepte und die Perzeption sprachlicher Größen und verspricht sich davon u.a. Aufschluss darüber, ob sprachräumliche Vorstellungen als Steuerungsgrößen für dialektale Stabilität oder dialektalen Wandel veranschlagt werden können. An Beispielen aus einem laufenden Forschungsprojekt, das sich mit einer Region in der Innerschweiz befasst, werden ethnodialektale Raumvorstellungen präsentiert und zu objektiven Sprachbefunden in Bezug gesetzt.
Den Wortschatz einer Sprache auf hohem Niveau zu dokumentieren und in all seinen Eigenschaften zu beschreiben, ist gleichermaßen wichtig wie schwierig. Verschiedene Gründe haben dazu geführt, dass die Tradition der großen Wörterbücher derzeit zusammenbricht. An ihre Stelle werden in der Zukunft flexibel handhabbare digitale lexikalische Systeme treten.
In this article, we explore the feasibility of extracting suitable and unsuitable food items for particular health conditions from natural language text. We refer to this task as conditional healthiness classification. For that purpose, we annotate a corpus extracted from forum entries of a food-related website. We identify different relation types that hold between food items and health conditions going beyond a binary distinction of suitability and unsuitability and devise various supervised classifiers using different types of features. We examine the impact of different task-specific resources, such as a healthiness lexicon that lists the healthiness status of a food item and a sentiment lexicon. Moreover, we also consider task-specific linguistic features that disambiguate a context in which mentions of a food item and a health condition co-occur and compare them with standard features using bag of words, part-of-speech information and syntactic parses. We also investigate in how far individual food items and health conditions correlate with specific relation types and try to harness this information for classification.
Social perception studies have revealed that smiling individuals are perceived more favourably on many communion dimensions in comparison to nonsmiling individuals. Research on gender differences in smiling habits showed that women smile more than men. In our study, we investigated this phenomena further and hypothesised that women perceive smiling individuals as more honest than men. An experiment conducted in seven countries (China, Germany, Mexico, Norway, Poland, Republic of South Africa and USA) revealed that gender may influence the perception of honesty in smiling individuals. We compared ratings of honesty made by male and female participants who viewed photos of smiling and nonsmiling people. While men and women did not differ on ratings of honesty in nonsmiling individuals, women assessed smiling individuals as more honest than men did. We discuss these results from a social norms perspective.
The puzzle we consider in this paper is that Merchant (2004) judges certain elliptical utterances in context to be ungrammatical, while Culicover and Jackendoff (2005) judge similar examples to be grammatical. The main difference between the examples appears to be that Merchant’s are introduced by no, while Culicover and Jackendoff’s are introduced by yes. We propose that the different judgments do not reflect grammaticality, but complexity associated with ambiguity. First, there is an ambiguity with respect to the reference of noun phrases in discourse: the relationship of the fragment to the preceding discourse is ambiguous. Second, there is an ambiguity with respect to the discourse function of an utterance, and in particular, whether it is an affirmation triggered by yes or a denial triggered by no. In the case of the denial, it needs to be established, which part of the preceding statement has to be corrected, while in the case of the affirmation, no such ambiguity arises. The interactions between these two interpretive functions may under certain circumstances render particular sentences in discourse difficult to interpret. Interpretive difficulty has the subjective flavor of ‘ungrammaticality’; in the case that we discuss here, these judgments form the basis for a particular linguistic analysis. But, we argue, manipulation of the dis-course context can simplify discourse interpretation by resolving the ambiguity, which removes the interpretive difficulty. The conclusion that we draw is that the phenomenon in question is not a matter of linguistic structure, but of discourse interpretation.
Pogled u e-leksikografiju
(2015)
U radu se daje pregled temeljnih pojmova i klasifikacija u području e-leksikografije. Donosi se klasifikacija e-rječnika, prikazuje se leksikografski proces izrade e-rječnika te pregled najraširenijih sustava za izradu rječnika (DWS) i sustava za pretragu korpusa (CQS). Kao primjer dobre prakse detaljnije se opisuje mrežni rječnik elexiko (Institut za njemački jezik u Mannheimu): prikazuju se njegovi ciljevi i namjena, teorijske i metodološke postavke, moduli te mogućnosti uporabe. Kao moguća osnova za izradu korpusno utemeljenoga e-rječnika hrvatskoga jezika koji bi bio u skladu s najrecentnijim leksikografskim (i uopće lingvističkim) teorijama i praksama prikazuje se rad na mrežnome leksičko-semantičkome repozitoriju hrvatskoga jezika (baza semantičkih okvira, predodžbenih shema, kognitivnih primitiva i leksičkih jedinica) u okviru projekta Repozitorij metafora hrvatskoga jezika.
In this contribution, we report on an effort to annotate German data with information relevant to opinion inference. Such information has previously been referred to as effect or couched in terms of eventevaluation functors. We extend the theory and present an extensive scheme that combines both approaches and thus extends the set of inference-relevant predicates. Using these guidelines to annotate 726 German synsets, we achieve good inter-annotator agreement.
In the German language, there are two central ways of integrating spatial and temporal information by means of word-formation. Firstly, this type of information is typically located in the verbal phrase of sentences. As a consequence, it plays a major role in the area of word-formation of verbs too. The two major classes of such verbs found in German (“Partikelverben” and “Doppelpartikelverben”) are located in the transition zone between syntax and word-formation. The same adverbial relation is found in one type of nominal compounds (“Rektionskomposita”). On the other hand, space and time are prominent among the relations constituting the patterns of the prototypical type of noun compounds (“N+N-Komposita”). The integration of these relations into compounds involves some kind of functional interpretation.
Familienähnlichkeiten deutscher Argumentstrukturmuster. Definitionen und grundlegende Annahmen
(2015)
Sprichwörter im Gebrauch
(2015)
Der Beitrag stellt die theoretischen und methodologischen Grundlagen des Lernerwörterbuchprojekts DICONALE anhand einiger Analysebeispiele vor. Es handelt sich um ein zweisprachig-bidirektionales, onomasiologisch-konzeptuell ausgerichtetes Verbwörterbuch, das sowohl zur Konsultation für Produktionszwecke ab B2-Niveau im Bereich DaF und ELE als auch für den Übersetzungsprozess in die jeweilige Fremdsprache dienlich sein soll. Es beruht auf häufigkeitsbasierten Daten vergleichbarer elektronisch verfügbarer Korpora beider Sprachen und soll dem Benutzer online zugänglich gemacht werden. Das Wörterbuch gliedert sich in unterschiedliche konzeptuelle (Sub)Felder, denen sich lexikalisch-semantische (Mini)Paradigmen zuordnen lassen. Es basiert auf einem modular-multilateralen lexikologischen Beschreibungsmodell, welches einzelsprachliche und sprachvergleichend relevante korpusbasierte Informationen zu Form, Bedeutung und Verwendung durch die Information von verschiedenen paradigmatischen und syntagmatischen Relationen verbaler und deverbaler Lexeme präsentiert.
This paper presents some theoretical and methodological foundations of the research project DICONALE, which concerns the development of an online dictionary of verbal lexemes with a special conceptual-onomasiological access and a paradigmatic structure in response to studies which have shown that conventional dictionaries (both monolingual and bilingual), do not satisfy the specific needs of users involved in the production of texts in foreign language.
Preface
(2015)
Linguistic usage patterns are not just coincidental phenomena on the textual surface but constitute a fundamental constructional principle of language. At the same time, however, linguistic patterns are highly idiosyncratic in the sense that they tend to be item-specific and unpredictable, thus defying all attempts at capturing them by general abstract rules. […] What all these approaches [that deal with constructions, collocations, patterns, etc. K.S.] share, in addition to their interest in recurrent patterns, is a strong commitment to the value of usage, be it in the wider sense of usage as an empirical basis for sound linguistic analysis and description or in the narrower sense of usage as constituting the basis for the emergence and consolidation of linguistic knowledge. (Herbst et al. 2014: 1)
In consequence of the feasibility of studying language data in new quantitative dimensions, the phraseology faces a paradigm shift. The traditional focus on strongly lexicalized, often idiomatic multi-word expressions (MWE) has led to an overestimation of their unique status in the mental lexicon. The majority of MWEs are typical lexical realisations of templates (‘MW patterns’) that emerged from repeated usage and can be instantiated with ever changing lexical elements. The – primarily functional – pattern restrictions cannot always be predicted with rules, but are the result of recurring context factors. In this article, at first, it has been shown the nature and the interrelations of MW patterns that are reconstructed with complex corpus-driven methods. Furthermore, a vision of a new phraseography of MW pattern that described their hierarchies and functions based on authentic corpus data like KWIC bundles, slot filler tables and collocation profiles has been discussed.
We present an approach for opinion role induction for verbal predicates. Our model rests on the assumption that opinion verbs can be divided into three different types where each type is associated with a characteristic mapping between semantic roles and opinion holders and targets. In several experiments, we demonstrate the relevance of those three categories for the task. We show that verbs can easily be categorized with semi-supervised graphbased clustering and some appropriate similarity metric. The seeds are obtained through linguistic diagnostics. We evaluate our approach against a new manually-compiled opinion role lexicon and perform in-context classification.
We present a quantitative approach to disambiguating flat morphological analyses and producing more deeply structured analyses. Based on existing morphological segmentations, possible combinations of resulting word trees for the next level are filtered first by criteria of linguistic plausibility and then by weighting procedures based on the geometric mean. The frequencies for weighting are derived from three different sources (counts of morphs in a lexicon, counts of largest constituents in a lexicon, counts of token frequencies in a corpus) and can be used either to find the best analysis on the level of morphs or on the next higher constituent level. The evaluation shows that for this task corpus-based frequency counts are slightly superior to counts of lexical data.
Opinion Holder and Target Extraction for Verb-based Opinion Predicates – The Problem is Not Solved
(2015)
We offer a critical review of the current state of opinion role extraction involving opinion verbs. We argue that neither the currently available lexical resources nor the manually annotated text corpora are sufficient to appropriately study this task. We introduce a new corpus focusing on opinion roles of opinion verbs from the Subjectivity Lexicon and show potential benefits of this corpus. We also demonstrate that state-of-the-art classifiers perform rather poorly on this new dataset compared to the standard dataset for the task showing that there still remains significant research to be done.
Der rechtslinguistische Zugang zu juristischen Texten gibt Aufschluss über Deutungsoptionen umstrittener Fachkonzepte. Dieser text- und diskursorientierte Ansatz ist für die Analyse der Kommunikation zwischen internationalen und nationalen Gerichten besonders erhellend, da hier die sprachliche Konstitution von Faktizität häufig mit gesteigerter Intensität geführt wird. Die Arbeit untersucht die Aushandlungsprozesse um nationalstaatliche Souveränität und Kompetenzverschiebungen anhand einer Sprachhandlungstypologie. Dabei werden sprachlich geronnene Konfliktlinien bei der Harmonisierung von nationalem Recht und Völkerrecht herausgestellt, deren Beschreibung als semantische Kämpfe im Kern der Betrachtung stehen. Als Beispiel dient der Sorgerechtsstreit ‚Görgülü‘. Durch die Untersuchung des Fachdiskurses und seiner Transformation in Medientexte können Vermittlungsprobleme aufgedeckt werden, wodurch ein Beitrag zur Transparenz bei der rechtsstaatlichen Faktizitätsherstellung geleistet wird.
Scales and Scores. An evaluation of methods to determine the intensity of subjective expressions
(2015)
In this contribution, we present a survey of several methods that have been applied to the ordering of various types of subjective expressions (e.g. good < great), in particular adjectives and adverbs. Some of these methods use linguistic regularities that can be observed in large text corpora while others rely on external grounding in metadata, in particular the star ratings associated with product reviews. We discuss why these methods do not work uniformly across all types of expressions. We also present the first application of some of these methods to the intensity ordering of nouns (e.g. moron < dummy).