Refine
Year of publication
Document Type
- Article (64)
- Part of a Book (58)
- Conference Proceeding (23)
- Review (2)
Language
- English (147) (remove)
Has Fulltext
- yes (147) (remove)
Is part of the Bibliography
- no (147) (remove)
Keywords
- Deutsch (34)
- Computerlinguistik (18)
- Konversationsanalyse (16)
- Englisch (11)
- Semantik (11)
- Automatische Sprachanalyse (9)
- Korpus <Linguistik> (9)
- Mehrsprachigkeit (9)
- Sprachpolitik (9)
- Annotation (8)
Publicationstate
- Postprint (147) (remove)
Reviewstate
- Peer-Review (65)
- (Verlags)-Lektorat (52)
- Peer-review (5)
- Verlags-Lektorat (3)
Publisher
- Benjamins (32)
- Springer (25)
- Oxford University Press (9)
- Elsevier (8)
- Sage (4)
- Wiley (4)
- Association for Computing Machinery (3)
- Edinburgh University Press (3)
- Palgrave Macmillan (3)
- SAGE (3)
This paper investigates synchronic variation in the lexical and grammatical environments of the German lexical verb verdienen ‘earn’, ‘deserve’. In its lexical uses, verdienen co-occurs with an object noun phrase whose head is either concrete (e.g. Geld ‘money’) or, more commonly, abstract (e.g. Beachtung ‘attention’). When it is used more grammatically with deontic modal meaning, verdienen is followed by a passive or active infinitive. This paper uses collostructional analyses to contrast lexical and grammatical uses in terms of the most strongly attracted lexical items, which are grouped into semantic classes. The results reflect different degrees of host-class expansion (cf. Himmelmann 2004), whereby the collexemes of verdienen expand from concrete to abstract and their morpho-syntactic contexts from nominal to infinitival complement and subsequently from passive to active. Synchronic distribution can thus serve as a window on diachronic development (Kuteva 2001), in this case the rise of a deontic modality marker.
Our paper discusses family language policies among multilingual families in Latvia with Russian as home language. The presentation is based on three case studies, i.e. interviews conducted with Russophones who have chosen to send their children to Latvian-medium pre-schools and schools. The main aim is to understand practices and regards among such families “from below,” i.e. which family-internal and family-external factors influenced the choice of Latvian-medium education and what impact this choice has on linguistic practices.
The paper shows that there have been critical events which both encouraged and discouraged the choice of Latvian-medium education. The wish to integrate into mainstream society has been met by obstacles both from ethnic Russians and Latvians. Yet, the three families consider their choices to be the right ones for the future development of their children in a multiethnic Latvia in which Latvian serves as the unifying language of society.
Nonnative-accented speakers face prevalent discrimination. The assumption that people freely express negative sentiments toward nonnative speakers has also guided common research methods. However, recent studies did not consistently find downgrading, so that prejudice against nonnative accents might even be questioned at first sight. The present theoretical article will bridge these contradictory findings in three ways: (a) We illustrate that nonnative speakers with foreign accents frequently may not be downgraded in commonly used first-impression and employment scenario paradigms. It appears that relatively controlled responding may be influenced by norms and motivations to respond without prejudice, whereas negative biases emerge in spontaneous responding. (b) We present an integrative view based on knowledge on modern forms of prejudice to develop modern notions of accent-ism, which allow for predictions when accent biases are (not) likely to surface. (c) We conclude with implications for interventions and a tailored research agenda.
Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage; its poetic genres impose unique combinatory constraints on linguistic elements. How does the constrained poetic structure facilitate speech segmentation when common linguistic and statistical cues are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language.
The present research unites two emergent trends in the area of language attitudes: (a) research on perceptions of nonnative speakers by nonnative listeners and (b) the search for general, basic mechanisms underlying the evaluation of nonnative accented speakers. In three experiments featuring an employment situation, German participants listened to a presentation given in English by a German speaker with a strong versus native-like accent (in Studies 1–3) versus a native speaker of English (in Study 1). They evaluated candidates with a strong accent worse than candidates with a native(-like) pronunciation—even to the degree that the quality of arguments was of no relevance (Study 1). Study 2 introduces an effective intervention to reduce these discriminatory tendencies. Across studies, affect and competence emerged as major mediators of hirability evaluations. Study 3 further revealed sequential indirect influences, which advance our understanding of previous inconsistent findings regarding disfluency and warmth perceptions.
Studies on the Linguistic Landscapes (LLs) investigate frequencies, functions, and power relations between languages and their speakers in public space. Research on the LL thereby aims to understand how the production and perception of signs reflect and simultaneously shape realities. In this sense, the LL is one of the most dynamic places where processes of minoritization take place: the (in)visibility of minority languages and the functional and symbolic relationships to majority languages are in direct relationship with negotiations of minorities’ place in society. This chapter looks at minority languages in the LL from two major perspectives. Firstly, it discusses language policies, focussing on which policy categories and which domains of language use are of particular relevance for understanding minority languages in the LL. Then, it turns to issues of conflict, contestation, and exclusion by providing examples from a range of geographically and typologically prototypical case studies, including Israel, Canada, Belgium, the Basque Country, and Friesland.
This chapter introduces readers to the context and concept of this volume. It starts by providing an historical overview of languages and multilingualism in Lithuania, Estonia and Latvia, highlighting the 100th anniversary of statehood which the three Baltic states are celebrating in 2018. Then, the chapter briefly presents important strands of research on multilingualism in the region throughout the past decades; in particular, questions about language policies and the status of the national languages (Estonian, Latvian and Lithuanian) and Russian. It also touches on debates about languages in education and the roles of other languages such as the regional languages of Latgalian and Võro and the changing roles of international languages such as English and German. The chapter concludes by providing short summaries of the contributions to this book.
German subjectively veridical sicher sein ‘be certain’ can embed ob-clauses in negative contexts, while subjectively veridical glauben ‘believe’ and nonveridical möglich sein ‘be possible’ cannot. The Logical Form of F isn’t certain if M is in Rome is regarded as the negated disjunction of two sentences ¬(cf σ ∨ cf ¬σ) or ¬cf σ ∧ ¬cf ¬σ. Be certain can have this LF because ¬cf σ and ¬cf ¬σ are compatible and nonveridical. Believe excludes this LF because ¬bf σ and ¬bf ¬σ are incompatible in a question-under-discussion context. It follows from this incompatibility and from the incompatibility of bf σ and bf ¬σ that bf ¬σ and ¬bf σ are equivalent. Therefore believe cannot be nonveridical. Be possible doesn’t allow the LF either. Similar to believe, ¬pf σ and ¬pf ¬σ are incompatible. But unlike believe, pf σ and pf ¬σ are compatible.
The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.
The use of digital resources and tools across humanities disciplines is steadily increasing, giving rise to new research paradigms and associated methods that are commonly subsumed under the term digital humanities. Digital humanities does not constitute a new discipline in itself, but rather a new approach to humanities research that cuts across different existing humanities disciplines. While digital humanities extends well beyond language-based research, textual resources and spoken language materials play a central role in most humanities disciplines.
The transfer of research data management from one institution to another infrastructural partner is all but trivial, but can be required, for instance, when an institution faces reorganization or closure. In a case study, we describe the migration of all research data, identify the challenges we encountered, and discuss how we addressed them. It shows that the moving of research data management to another institution is a feasible, but potentially costly enterprise. Being able to demonstrate the feasibility of research data migration supports the stance of data archives that users can expect high levels of trust and reliability when it comes to data safety and sustainability.
Are borrowed neologisms accepted more slowly into the German language than German words resulting from the application of wrd formation rules? This study addresses this question by focusing on two possible indicators for the acceptance of neologisms: a) frequency development of 239 German neologisms from the 1990s (loanwords as well as new words resulting from the application of word formation rules) in the German reference corpus DEREKO and b) frequency development in the use of pragmatic markers (‘flags’, namely quotation marks and phrases such as sogenannt ‘so-called’) with these words. In the second part of the article, a psycholinguistic approach to evaluating the (psychological) status of different neologisms and non-words in an experimentally controlled study and plans to carry out interviews in a field test to collect speakers’ opinions on the acceptance of the analysed neologisms are outlined. Finally, implications for the lexicographic treatment of both types of neologisms are discussed.
Latvia
(2019)
This chapter deals with current issues in bilingual education in the framework of language and educational policies in Latvia, and also outlines similarities or common tendencies in the two other Baltic states, Estonia and Lithuania. As commonly understood in the 21st century, the term ‘bilingual education’ includes ‘multilingual education, as the umbrella term to cover a wide spectrum of practice and policy’ (García, 2009: 9).
Just like most varieties of West Germanic, virtually all varieties of German use a construction in which a cognate of the English verb 'do' (standard German 'tun') functions as an auxiliary and selects another verb in the bare infinitive, a construction known as 'do'-periphrasis or 'do'-support. The present paper provides an Optimality Theoretic (OT) analysis of this phenomenon. It builds on a previous analysis by Bader and Schmid (An OT-analysis of 'do'-support in Modern German, 2006) but (i) extends it from root clauses to subordinate clauses and (ii) aims to capture all of the major distributional patterns found across (mostly non-standard) varieties of German. In so doing, the data are used as a testing ground for different models of German clause structure. At first sight, the occurrence of 'do' in subordinate clauses, as found in many varieties, appears to support the standard CP-IP-VP analysis of German. In actual fact, however, the full range of data turn out to challenge, rather than support, this model. Instead, I propose an analysis within the IP-less model by Haider (Deutsche Syntax - generativ. Vorstudien zur Theorie einer projektiven Grammatik, Narr, Tübingen, 1993 et seq.). In sum, the 'do'-support data will be shown to have implications not only for the analysis of clause structure but also for the OT constraints commonly assumed to govern the distribution of 'do', for the theory of non-projecting words (Toivonen in Non-projecting words, Kluwer, Dordrecht, 2003) as well as research on grammaticalization.
Several studies have examined effects of explicit task demands on eye movements in reading. However, there is relatively little prior research investigating the influence of implicit processing demands. In this study, processing demands were manipulated by means of a between-subject manipulation of comprehension question difficulty. Consistent with previous results from Wotschack and Kliegl, the question difficulty manipulation influenced the probability of regressing from late in sentences and re-reading earlier regions; readers who expected difficult comprehension questions were more likely to re-read. However, this manipulation had no reliable influence on eye movements during first-pass reading of earlier sentence regions. Moreover, for the subset of sentences that contained a plausibility manipulation, the disruption induced by implausibility was not modulated by the question manipulation. We interpret these results as suggesting that comprehension demands influence reading behavior primarily by modulating a criterion for comprehension that readers apply after completing first-pass processing.
Since Lerner coined the notion of delayed completion in 1989, this recurrent social practice of continuing one’s speaking turn while disregarding an intermediate co-participant’s utterance has not been investigated with regard to embodied displays and actions. A sequential approach to videotaped mundane conversations in German will explain the occurrence and use of delayed completions. First, especially in multi-party and multi-activity settings, delayed completions can result from reduced monitoring and coordinating activities. Second, recipients can use intra-turn response slots for more extended responsive actions than the current speaker initially projected, leading to delayed completion sequences. Finally, delayed completions are used for blocking possibly misaligned co-participant actions. The investigation of visible action illustrates that delayed completions are a basic practice for retrospectively managing co-participant response slots.
Psychological research has neglected people whose accent does not match their appearance. Most research on person perception has focused on appearance, overlooking accents that are equally important social cues. If accents were studied, it was often done in isolation (i.e., detached from appearance). We examine how varying accent and appearance information about people affects evaluations. We show that evaluations of expectancy-violating people shift in the direction of the added information. When a job candidate looked foreign, but later spoke with a native accent, his evaluations rose and he was evaluated best of all candidates (Experiment 1a). However, the sequence in which information was presented mattered: When heard first and then seen, his evaluations dropped (Experiment 1b). Findings demonstrate the importance of studying the combination and sequence of different types of information in impression formation. They also allow predicting reactions to ethnically mixed people, who are increasingly present in modern societies.
Aversion to loanwords may express itself in various ways: deliberate and motivated by ideology of linguistic purism or more implicit and motivated by the strength of one’s national identification and ethnolinguistic vitality. A study of Polish philology students assessed their tendency to choose loanwords versus synonymous native words. The results supported a two-path model of linguistic purism. Social identity (strength of identification) directly predicted avoidance of loanwords, whereas ideological concerns (conservative political views) predicted it indirectly, through purist ideology.
As open class repair initiators (OCRIs, e.g., “what” or “huh”) do not specify the type of repairable, choosing an adequate repair format in the next turn becomes a practical problem for the participants. Whereas in monolingual/L1 speaker conversations participants typically orient towards troubles caused by reduced acoustic intelligibility or by topical/sequential disjunction, in multilingual/L2 interactions possible problems regarding asymmetric language choices and skills can be added – and might be responded to accordingly. Based on videotaped international business meetings and interactions at a customs post, this paper investigates various open class and embodied other-initiations of repair. By means of a conversation analytical and multimodal approach to social interaction, this contribution focuses first on instances of audible OCRIs and illustrates that they are accompanied by embodied conduct. Second, two types of embodied other-initiation of repair are scrutinized: a lifted eyebrows/head display and a freeze display in which movements are suspended. The analysis shows that participants treat these as referring either to troubles in hearing (display 1) or to troubles in understanding the linguistic format (display 2). This leads to the formulation of further desiderata and analytical challenges regarding the multimodal other-initiation of repair in general and in professional international settings in particular.
Nonnative accents are prevalent in our globalized world and constitute highly salient cues in social perception. Whereas previous literature has commonly assumed that they cue specific social group stereotypes, we propose that nonnative accents generally trigger spontaneous negatively biased associations (due to a general nonnative accent category and perceptual influences). Accordingly, Study 1 demonstrates negative biases with conceptual IATs, targeting the general concepts of accent versus native speech, on the dimensions affect, trust, and competence, but not on sociability. Study 2 attests to negative, largely enhanced biases on all dimensions with auditory IATs comprising matched native–nonnative speaker pairs for four accent types. Biases emerged irrespective of the accent types that differed in attractiveness, recognizability of origin, and origin-linked national associations. Study 3 replicates general IAT biases with an affect IAT and a conventional evaluative IAT. These findings corroborate our hypotheses and assist in understanding general negativity toward nonnative accents.
This chapter investigates policies which shape the role of the German language in contemporary Estonia. Whereas German played for many centuries an important role as the language of the economic and cultural elite in Estonia, it severely declined in importance throughout the twentieth century. Mirrored on this historical background, the paper provides an overview of the current functions of German and attitudes towards it and it discusses how these functions and attitudes are influenced by policies of various actors from inside and outside Estonia. The paper argues that German continues to play a significant role: while German is no longer a lingua franca, it still enjoys a number of functions and prestige in clearly defined niches involving communication within German-speaking circles or between Estonians and Germans. The interplay of language policies of the Estonian and the German-speaking states as well as by semi-state and private institutions succeed in maintaining German as an additional language in contemporary Estonia.
Most research on ethnicity has focused on visual cues. However, accents are strong social cues that can match or contradict visual cues. We examined understudied reactions to people whose one cue suggests one ethnicity, whereas the other cue contradicts it. In an experiment conducted in Germany, job candidates spoke with an accent either congruent or incongruent with their (German or Turkish) appearance. Based on ethnolinguistic identity theory, we predicted that accents would be strong cues for categorization and evaluation. Based on expectancy violations theory we expected that incongruent targets would be evaluated more extremely than congruent targets. Both predictions were confirmed: accents strongly influenced perceptions and Turkish-looking German-accented targets were perceived as most competent of all targets (and additionally most warm). The findings show that bringing together visual and auditory information yields a more complete picture of the processes underlying impression formation.
Telicity and agentivity are semantic factors that split intransitive verbs into (at least two) different classes. Clear-cut unergative verbs, which select the auxiliary HAVE, are assumed to be atelic and agent-selecting; unequivocally unaccusative verbs, which select the auxiliary BE, are analyzed as telic and patient-selecting. Thus, agentivity and telicity are assumed to be inversely correlated in split intransitivity. We will present semantic and experimental evidence from German and Mandarin Chinese that casts doubts on this widely held assumption. The focus of our experimental investigation lies on variation with respect to agentivity (specifically motion control, manipulated via animacy), telicity (tested via a locative vs. goal adverbial), and BE/HAVE-selection with semantically flexible intransitive verbs of motion. Our experimental methods are acceptability ratings for German and Chinese (Experiments 1 and 2) and event-related potential (ERP) measures for German (Experiment 3). Our findings contradict the above-mentioned assumption that agentivity and telicity are generally inversely correlated and suggest that for the verbs under study, agentivity and telicity harmonize with each other. Furthermore, the ERP measures reveal that the impact of the interaction under discussion is more pronounced on the verb lexeme than on the auxiliary. We also found differences between Chinese and German that relate to the influence of telicity on BE/HAVE-selection. They seem to confirm the claim in previous research that the weight of the telicity factor locomotion (or internal motion) is cross-linguistically variable.
In this paper we present work in developing a computerized grammar for the Latin language. It demonstrates the principles and challenges in developing a grammar for a natural language in a modern grammar formalism. The grammar presented here provides a useful resource for natural language processing applications in different fields. It can be easily adopted for language learning and use in language technology for Cultural Heritage like translation applications or to support post-correction of document digitization.
We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.
Basic grammatical categories may carry social meanings irrespective of their semantic content. In a set of four studies, we demonstrate that verbs—a basic linguistic category present and distinguishable in most languages—are related to the perception of agency, a fundamental dimension of social perception. In an archival analysis of actual language use in Polish and German, we found that targets stereotypically associated with high agency (men and young people) are presented in the immediate neighborhood of a verb more often than non-agentic social targets (women and older people). Moreover, in three experiments using a pseudo-word paradigm, verbs (but not adjectives and nouns) were consistently associated with agency (but not with communion). These results provide consistent evidence that verbs, as grammatical vehicles of action, are linguistic markers of agency. In demonstrating meta-semantic effects of language, these studies corroborate the view of language as a social tool and an integral part of social perception.
Objective: Discrimination against nonnative speakers is widespread and largely socially acceptable. Nonnative speakers are evaluated negatively because accent is a sign that they belong to an outgroup and because understanding their speech requires unusual effort from listeners. The present research investigated intergroup bias, based on stronger support for hierarchical relations between groups (social dominance orientation [SDO]), as a predictor of hiring recommendations of nonnative speakers.
Method: In an online experiment using an adaptation of the thin-slices methodology, 65 U.S. adults (54% women; 80% White; M[age] = 35.91, range = 18–67) heard a recording of a job applicant speaking with an Asian (Mandarin Chinese) or a Latino (Spanish) accent. Participants indicated how likely they would be to recommend hiring the speaker, answered questions about the text, and indicated how difficult it was to understand the applicant.
Results: Independent of objective comprehension, participants high in SDO reported that it was more difficult to understand a Latino speaker than an Asian speaker. SDO predicted hiring recommendations of the speakers, but this relationship was mediated by the perception that nonnative speakers were difficult to understand. This effect was stronger for speakers from lower status groups (Latinos relative to Asians) and was not related to objective comprehension.
Conclusions: These findings suggest a cycle of prejudice toward nonnative speakers: Not only do perceptions of difficulty in understanding cause prejudice toward them, but also prejudice toward low-status groups can lead to perceived difficulty in understanding members of these groups.
A polarity-sensitive item (PSI), as traditionally defined, is an expression that is restricted to either an affirmative or negative context. PSIs like ‘lift a finger’ and ‘all the time in the world’ sub-serve discourse routines like understatement and emphasis. Lexical–semantic classes are increasingly invoked in descriptions of the properties of PSIs. Here, we use English corpus data and the tools of Frame Semantics (Fillmore, 1982, 1985) to explore Israel’s (2011) observation that the semantic role of a PSI determines how the expression fits into a contextually constructed scalar model. We focus on a class of exceptions implied by Israel’s model: cases in which a given PSI displays two countervailing patterns of polarity sensitivity, with attendant differences in scalar entailments. We offer a set of case studies of polaritysensitive expressions – including verbs of attraction and aversion like ‘can live without’, monetary units like ‘a red cent’, comparative adjectives and time-span adverbials – that demonstrate that the interpretation of a given PSI in a given polar context is based on multiple factors. These factors include the speaker’s perspective on and affective stance towards the described event, available inferences about causality and, perhaps most critically, particulars of the predication, including the verb or adjective’s frame membership, the presence or absence of an ability modal like can, the grammatical construction used and the range of contingencies evoked by the utterance.
The English language has taken advantage of the Digital Revolution to establish itself as the global language; however, only 28.6 %of Internet users speak English as their native language. Machine Trans-lation (MT) is a powerful technology that can bridge this gap. In devel-opment since the mid-20th century, MT has become available to every Internet user in the last decade, due to free online MT services. This paper aims to discuss the implications that these tools may have for the privacy of their users and how they are addressed by EU data protec-tion law. It examines the data-flows in respect of the initial processing (both from the perspective of the user and the MT service provider) and potential further processing that may be undertaken by the MT service provider.
Our paper deals with the use of ICH WEIß NICHT (‘I don’t know’) in German talk-in-interaction. Pursuing an Interactional Linguistics approach, we identify different interactional uses of ICH WEIß NICHT and discuss their relationship to variation in argument structure (SV (O), (O)VS, V-only). After ICH WEIß NICHT with full complementation, speakers emphasize their lack of knowledge or display reluctance to answer. In contrast, after variants without an object complement, in contrast, speakers display uncertainty about the truth of the following proposition or about its sufficiency as an answer. Thus, while uses with both subject and object tend to close a sequence or display lack of knowledge, responses without an object, in contrast, function as a prepositioned epistemic hedge or a pragmatic marker framing the following TCU. When ICH WEIß NICHT is used in response to a statement, it indexes disagreement (independently from all complementation patterns).
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
Precise multimodal studies require precise synchronisation between audio and video signals. However, raw audio and audio from video recordings can be out of sync for several reasons. In order to re-synchronise them, a dynamic programming (DP) approach is presented here. Traditionally, DP is performed on the rectangular distance matrix comparing each value in signal A with each value in signal B. Previous work limited the search space using for example the Sakoe Chiba Band (Sakoe and Chiba, 1978). However, the overall space of the distance matrix remains identical. Here, a tunnel matrix and its according DP-algorithm are presented. The matrix contains merely the computed distance of two signals to a pre-specified bandwidth and the computational cost is equally reduced. An example implementation demonstrates the functionality on artificial data and on data from real audio and video recordings.
This chapter analyses the impact of political decentralization in a state on the position of ethnic and linguistic minorities, in particular with regard to the role of parliamentary assemblies in the political system. It relates a number of typical functions of parliaments to the specific needs of minorities and their languages. The most important of these functions are the representation of the minority and responsiveness to the minority’s needs. The chapter then discusses six examples from the European Union (and Norway) which prototypically represent different types of parliamentary decentralization: the ethnically defined Sameting in Norway and its importance for the Sámi population, the Scottish Parliament and its role for speakers of Scottish Gaelic, the German regional parliaments of the Länder of Schleswig-Holstein and Saxony and their impact on the Frisian and Sorbian minorities respectively, the autonomy of predominantly German-speaking South Tyrol within the Italian state, and finally the situation of the speakers of Latgalian in Latvia, where a decentralized parliament is missing. The chapter also makes suggestions on comparisons of these situations with minorities in Russia. It finally argues that political decentralization may indeed empower minorities to gain a greater voice in their states, even if much ultimately depends on individual factors in each situation and the attitudes by the majority population and the political center.
In this article, we explore the feasibility of extracting suitable and unsuitable food items for particular health conditions from natural language text. We refer to this task as conditional healthiness classification. For that purpose, we annotate a corpus extracted from forum entries of a food-related website. We identify different relation types that hold between food items and health conditions going beyond a binary distinction of suitability and unsuitability and devise various supervised classifiers using different types of features. We examine the impact of different task-specific resources, such as a healthiness lexicon that lists the healthiness status of a food item and a sentiment lexicon. Moreover, we also consider task-specific linguistic features that disambiguate a context in which mentions of a food item and a health condition co-occur and compare them with standard features using bag of words, part-of-speech information and syntactic parses. We also investigate in how far individual food items and health conditions correlate with specific relation types and try to harness this information for classification.
Social perception studies have revealed that smiling individuals are perceived more favourably on many communion dimensions in comparison to nonsmiling individuals. Research on gender differences in smiling habits showed that women smile more than men. In our study, we investigated this phenomena further and hypothesised that women perceive smiling individuals as more honest than men. An experiment conducted in seven countries (China, Germany, Mexico, Norway, Poland, Republic of South Africa and USA) revealed that gender may influence the perception of honesty in smiling individuals. We compared ratings of honesty made by male and female participants who viewed photos of smiling and nonsmiling people. While men and women did not differ on ratings of honesty in nonsmiling individuals, women assessed smiling individuals as more honest than men did. We discuss these results from a social norms perspective.
Two very reliable influences on eye fixation durations in reading are word frequency, as measured by corpus counts, and word predictability, as measured by cloze norming. Several studies have reported strictly additive effects of these 2 variables. Predictability also reliably influences the amplitude of the N400 component in event-related potential studies. However, previous research suggests that while frequency affects the N400 in single-word tasks, it may have little or no effect on the N400 when a word is presented with a preceding sentence context. The present study assessed this apparent dissociation between the results from the 2 methods using a coregistration paradigm in which the frequency and predictability of a target word were manipulated while readers’ eye movements and electroencephalograms were simultaneously recorded. We replicated the pattern of significant, and additive, effects of the 2 manipulations on eye fixation durations. We also replicated the predictability effect on the N400, time-locked to the onset of the reader’s first fixation on the target word. However, there was no indication of a frequency effect in the electroencephalogram record. We suggest that this pattern has implications both for the interpretation of the N400 and for the interpretation of frequency and predictability effects in language comprehension.
Content analysis provides a useful and multifaceted, methodological framework for Twitter analysis. CAQDAS tools support the structuring of textual data by enabling categorising and coding. Depending on the research objective, it may be appropriate to choose a mixed-methods approach that combines quantitative and qualitative elements of analysis and plays out their respective advantages to the greatest possible extent while minimising their shortcomings. In this chapter, we will discuss CAQDAS speech act analysis of tweets as an example of software-assisted content analysis. We start with some elementary thoughts on the challenges of the collection and evaluation of Twitter data before we give a brief description of the potentials and limitations of using the software QDA Miner (as one typical example for possible analysis programmes). Our focus will lie on analytical features that can be particularly helpful in speech act analysis of tweets.
We continue the study of the reproducibility of Propp’s annotations from Bod et al. (2012). We present four experiments in which test subjects were taught Propp’s annotation system; we conclude that Propp’s system needs a significant amount of training, but that with sufficient time investment, it can be reliably trained for simple tales.
We present a technique called event mapping that allows to project text representations into event lists, produce an event table, and derive quantitative conclusions to compare the text representations. The main application of the technique is the case where two classes of text representations have been collected in two different settings (e.g., as annotations in two different formal frameworks) and we can compare the two classes with respect to their systematic differences in the event table. We illustrate how the technique works by applying it to data collected in two experiments (one using annotations in Vladimir Propp’s framework, the other using natural language summaries).
Communication of stereotypes in the classroom: biased language use of German and Turkish adolescents
(2014)
Little is known about the linguistic transmission and maintenance of mutual stereotypes in interethnic contexts. This field study, therefore, investigated the linguistic expectancy bias (LEB) and the linguistic intergroup bias (LIB) among German and Turkish adolescents (13 to 20 years) in the school context. The LEB refers to the general phenomenon of describing stereotypes more abstractly. The LIB is the tendency to use language abstraction for in-group protective reasons. Results revealed an unmoderated LEB, whereas the LIB only occurred when foreigners were in the numerical majority, the classroom composition was perceived as a learning disadvantage, or the interethnic conflict frequency was high. These findings provide first evidence for the use of both LEB and LIB in an interethnic classroom setting.
This paper analyses paramedic emergency interaction as multimodal multiactivity. Based on a corpus of video-recordings of emergency drills performed by professional paramedics during advanced training, the focus is on paramedics’ participation in multiple joint projects which become simultaneously relevant. Simultaneity and fast succession of multiactivity does not only characterise work on the team level, but also the work profile of the individual paramedic. Participants have to coordinate their own participation in more than one joint project intrapersonally. In the data studied, three patterns of allocating multimodal resources stood out as routine ways of coordinating participation in two simultaneous projects intrapersonally:
1. Talk and hearing vs. manual action monitored by gaze,
2. Talk and hearing vs. gazing (and pointing),
3. Manual action vs. gaze (and talk and hearing).
Prejudice against a social group may lead to discrimination of members of this group. One very strong cue of group membership is a (non)standard accent in speech. Surprisingly, hardly any interventions against accent-based discrimination have been tested. In the current article, we introduce an intervention in which what participants experience themselves unobtrusively changes their evaluations of others. In the present experiment, participants in the experimental condition talked to a confederate in a foreign language before the experiment, whereas those in the control condition received no treatment. Replicating previous research, participants in the control condition discriminated against Turkish-accented job candidates. In contrast, those in the experimental condition evaluated Turkish- and standard-accented candidates as similarly competent. We discuss potential mediating and moderating factors of this effect.
Growing globalisation of the world draws attention to cultural differences between people from different countries or from different cultures within the countries. Notwithstanding the diversity of people’s worldviews, current cross-cultural research still faces the challenge of how to avoid ethnocentrism; comparing Western-driven phenomena with like variables across countries without checking their conceptual equivalence clearly is highly problematic. In the present article we argue that simple comparison of measurements (in the quantitative domain) or of semantic interpretations (in the qualitative domain) across cultures easily leads to inadequate results. Questionnaire items or text produced in interviews or via open-ended questions have culturally laden meanings and cannot be mapped onto the same semantic metric. We call the culture-specific space and relationship between variables or meanings a ’cultural metric’, that is a set of notions that are inter-related and that mutually specify each other’s meaning. We illustrate the problems and their possible solutions with examples from quantitative and qualitative research. The suggested methods allow to respect the semantic space of notions in cultures and language groups and the resulting similarities or differences between cultures can be better understood and interpreted.
Feminine forms of job titles raise great interest in many countries. However, it is still unknown how they shape stereotypical impressions on warmth and competence dimensions among female and male listeners. In an experiment with fictitious job titles men perceived women described with feminine job titles as significantly less warm and marginally less competent than women with masculine job titles, which led to lower willingness to employ them. No such effects were observed among women.
"Standard language" is a contested concept, ideologically, empirically and theoretically. This is particularly true for a language such as German, where the standardization of the spoken language was based on the written standard and was established with respect to a communicative situation, i.e. public speech on stage (Bühnenaussprache), which most speakers never come across. As a consequence, the norms of the oral standard exhibit many features which are infrequent in the everyday speech even of educated speakers. This paper discusses ways to arrive at a more realistic conception of (spoken) standard German, which will be termed "standard usage". It must be founded on empirical observations of speakers linguistic choices in everyday situations. Arguments in favor of a corpus-based notion of standard have to consider sociolinguistic, political, and didactic concerns. We report on the design of a large study of linguistic variation conducted at the Institute for the German Language (project "Variation in Spoken German", Variation des gesprochenen Deutsch) with the aim of arriving at a representative picture of "standard usage" in contemporary German. It systematically takes into account both diatopic variation covering the multi-national space in which German an official language, and diastratic variation in terms of varying degrees of formality. Results of the study of phonetic and morphosyntactic variation are discussed. At least for German, a corpus-based notion of "standard usage" inevitably includes some degree of pluralism concerning areal variation, and it needs to do justice to register-based variation as well.
This article advocates an understanding of ‘positioning’ as a key to the analysis of identities in interaction within the methodological framework of conversation analysis. Building on research by Bamberg, Georgakopoulou and others, a performative, interaction-based approach to positioning is outlined and compared to membership categorization analysis. An interactional episode involving mock stories to reveal and reproach an inadequate identity-claim of a co-participant is analysed both in terms of practices of membership categorization and positioning. It is concluded that membership categorization is a core element of positioning. Still, positioning goes beyond membership categorization in a) revealing biographical dimensions accomplished by narration and b) by uncovering implicit performative claims of identity, which are not established by categorization or description.
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
Sexual harassment severely impacts the educational system in the West African country Benin and the progress of women in this society that is characterized by great gender inequality. Knowledge of the belief systems rooting in the sociocultural context is crucial to the understanding of sexual harassment. However, no study has yet investigated how sexual harassment is related to fundamental beliefs in Benin or West African countries. We conducted a field study on 265 female and male students from several high schools in Benin to investigate the link between sexual harassment and measures of ambivalent sexism, gender identity, and rape myth acceptance. Almost half of the sample reported having experienced sexual harassment personally or among peers. Levels of sexism and rape myth acceptance were very high compared to other studies. These attitudes appeared to converge in a sexist belief system that was linked to personal experiences, the perceived probability of experiencing and fear of sexual harassment. Results suggest that sexual harassment is a societal problem and that interventions need to address fundamental attitudes held in societies low in gender equality.
Automatic recognition of speech, thought, and writing representation in German narrative texts
(2013)
This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.
In this article, we examine the effectiveness of bootstrapping supervised machine-learning polarity classifiers with the help of a domain-independent rule-based classifier that relies on a lexical resource, i.e., a polarity lexicon and a set of linguistic rules. The benefit of this method is that though no labeled training data are required, it allows a classifier to capture in-domain knowledge by training a supervised classifier with in-domain features, such as bag of words, on instances labeled by a rule-based classifier. Thus, this approach can be considered as a simple and effective method for domain adaptation. Among the list of components of this approach, we investigate how important the quality of the rule-based classifier is and what features are useful for the supervised classifier. In particular, the former addresses the issue in how far linguistic modeling is relevant for this task. We not only examine how this method performs under more difficult settings in which classes are not balanced and mixed reviews are included in the data set but also compare how this linguistically-driven method relates to state-of-the-art statistical domain adaptation.
Reformulating place
(2013)
This report examines what can be accomplished in conversation by reformulating a reference to a place using the practices of repair. It is based on an analysis of a collection of place references situated in second pair parts of adjacency pairs taken from a wide range of field recordings of talk-in-interaction. Not surprisingly, place references are sometimes reformulated so as to indicate a misspeaking or in pursuit of recipient recognition. At other times, however, we show that place references can be reformulated to more adequately implement the action of a turn in prosecuting the course of action of which it is a part. In these cases repairing a place reference can target a source of trouble associated with implementing the action of a turn at talk, and thus reformulating place can serve as a practical resource for accomplishing a range of interactional tasks. We conclude with a more complex case in which two reformulations are deployed in responding to a so-called ‘double-barrelled’ initiating action.
Pseudoclefts in Hungarian
(2013)
Based on novel data from Hungarian, this paper makes the case that in at least some languages specificational pseudocleft sentences must receive a ‘what-you- see-is-what-you-get’ syntactic analysis. More specifically, it is argued that the clefted constituent is the subject of predication (underlyingly base-generated in Spec, Pr), whereas the cleft clause acts as a predicate in the structure. Alongside connectivity effects characteristic of specificational pseudoclefts, we also discuss a range of anti-connectivity effects, which we show to receive a straightforward explanation under the proposed analysis. It follows that attested connectivity effects, in turn, require a semantic, rather than a syntactic account, along the lines of Jacobson (1994) and Sharvit (1999).
The authors compare the use of two formats for requesting an object in informal everyday interaction: imperatives, common in our Polish data, and second-person polar questions, common in our English data. Imperatives and polar questions are selected in the same interactional “home environments” across the languages, in which they enact two social actions: drawing on shared responsibility and enlisting assistance, respectively. Speakers across the languages differ in their choice of request format in “mixed” interactional environments that support either. The finding shed light on the orderly ways in which cultural diversity is grounded in invariants of action formation.
Drawing on naturalistic video and audio recordings of international meetings, and within the framework of conversation analysis, ethnomethodology and interactional linguistics, this chapter studies how multilingual resources are mobilized in social interactions among professionals, how available linguistic and embodied resources are identified and used by the participants, which solutions are locally elaborated by them when they are confronted with various languages spoken but not shared among them, and which definition of multilingualism they adopt for all practical purposes. Focusing on the multilingual solutions emically elaborated in international professional meetings, we show that the participants orient to a double principle: on the one hand, they orient to the progressivity of the interaction, adopting all the possible resources that enable them to go on within the current activity; on the other hand, they orient to the intersubjectivity of the interaction, treating, preventing and repairing possible troubles and problems of understanding. Specific multilingual solutions can be adopted to keep this difficult balance between progressivity and intersubjectivity; they vary according to the settings, the competences at hand, the linguistic and embodied resources locally defined by the participants as publicly available, the multilingual resources treated as totally or partially shared, as transparent or opaque, and as needing repair or not. The paper begins by sketching the analytical framework, including the methodology and the data collected; it then presents some general findings, before offering an analysis of various ways in which participants keep the balance between progressivity and intersubjectivity in different multilingual interactional contexts.
Based on German speaking data from various activity types, the range of multimodal resources used to construct turn-beginnings is reviewed. It is claimed that participants in talk-in-interaction need to deal with four tasks in order to construct a turn which precisely fits the interactional moment of its production:
1. Achieve joint orientation: The accomplishment of the socio-spatial prerequisites necessary for producing a turn which is to become part of the participants’ common ground.
2. Display uptake: Next speaker needs to display his/her understanding of the interaction so far as the backdrop on which the production of the upcoming turn is based.
3. Deal with projections from prior talk: The speaker has to deal with projections which have been established by (the) previous turn(s) with respect to the upcoming turn.
4. Project properties of turn-in-progress: The speaker needs to orient the recipient to properties of the turn s/he is about to produce.
Turn-design thus can be seen to be informed by tasks related to the multimodal, embodied, and interactive contingencies of online-construction of turns. The four tasks are ordered in terms of prior tasks providing the prerequisite for accomplishing a later task.
Dropping out of overlap is a frequent practice for overlap resolution (Schegloff, 2000, Jefferson, 2004) in interaction, as it re-establishes the “one-at-a-time” principle of the turn-taking system (Sacks et al., 1974). While it is appropriate to analyze the practice of dropping out of overlap as a verbal and thus audible phenomenon, a close look at video data reveals that withdrawing from an action trajectory is also an embodied practice. Based on a fine-grained multimodal analysis (C. Goodwin, 1981, Mondada, 2007a, Mondada, 2007b) of videotaped interactions in French, this paper illustrates how overlapped speakers organize the momentary suspension of their action trajectory in visible ways. Indeed, participants do not instantly withdraw from their action trajectory when they stop talking. By using bodily resources, they are able to display continuous monitoring of the availability of their co-participants and of the next possible slot for resuming their suspended action. I therefore suggest analyzing the drop out of overlap as the first step of withdrawal, as definitive, embodied withdrawal can occur later, or, in case of resumption, not at all. Consequently, my paper analyzes withdrawal as a good example of strengthening the analytic concept of embodiment with regard to turn-taking practices in interaction.
In Spoken Egyptian, the form of a linguistic sign is restricted by rules of root structure and consonant compatibility as well as word-formation patterns. Hieroglyphic Egyptian, however, displays additional principles of sign formation. Iconicity is one of the crucial features of a part of its sign inventory. In this article, hieroglyphic iconicity will be investigated by means of a preliminary comparative typology originally developed for German Sign Language (Kutscher 2010). The authors argue that patterns found in Egyptian hieroglyphic sign formation are systematically comparable to patterns of German Sign Language (DGS). These patterns determine what types of lexical meaning can be inferred from iconic linguistic signs.
The ISOcat registry reloaded
(2012)
The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data categories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry’s content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community’s terminological metadata repertoire.
Integrated Linguistic Annotation Models and Their Application in the Domain of Antecedent Detection
(2011)
Seamless integration of various, often heterogeneous linguistic resources in terms of their output formats and a combined analysis of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats to structure single annotations for specific linguistic issues, in the last years a variety of specifications to store multiple annotations over the same primary data has been developed. The paper focuses on the integration of the knowledge resource logical document structure information into a text document to enhance the task of automatic anaphora resolution both for the task of candidate detection and antecedent selection. The paper investigates data structures necessary for knowledge integration and retrieval.
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication.
Linguistic variation and linguistic virtuosity of young “Ghetto”-migrants in Mannheim, Germany
(2011)
In this paper, we provide an insight into the life world and social experiences of young Turkish migrants who are categorised by German society as “social problem cases”. Based on natural conversational data, we describe the communicative repertoire of one migrant adolescent and that of his friends. Our aims are (a) to isolate those linguistic features that convey the impression of “foreignness”, and stand out among other German speakers’ features, and (b) to analyse the variability in our informants’ discursive practices - i.e. code- or style-switching, as it is commonly referred to in the literature - in order to show how variation serves as a communicative resource. Our findings show that these adolescents’ remarkable linguistic proficiency and communicative competence contrast markedly to their low educational and professional status.
Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific joumal articles is given. The parser’s core consists of cascaded applications of the GAP, a Generic Annotation Parser. Details of the chart parsing algorithm are provided, as well as a short evaluation in terms of comparisons with reference annotations from our corpus and with recently developed Systems with a similar task.
This paper provides a unified semantic and discourse pragmatic analysis of the German particle nämlich, traditionally described as having a specificational and an explanative reading. Our claim is that nämlich is a discourse marker which signals that the expression it is attached to is a short (elliptic) answer to a salient implicit question about the previous utterance. We show how both the explanative and the specificational reading can be derived from this more general semantic contribution. In addition we discuss some cross linguistic consequences of our analysis.
How to propose an action as an objective necessity. The case of Polish trzeba x (‘one needs to x’)
(2011)
The present study demonstrates that language-specific grammatical resources can afford speakers language-specific ways of organizing cooperative practical action. On the basis of video recordings of Polish families in their homes, we describe action affordances of the Polish impersonal modal declarative construction trzeba x (“one needs to x”) in the accomplishment of everyday domestic activities, such as cutting bread, bringing recalcitrant children back to the dinner table, or making phone calls. Trzeba-x turns in first position are regularly chosen by speakers to point to a possible action as an evident necessity for the furthering of some broader ongoing activity. Such turns in first position provide an environment in which recipients can enact shared responsibility by actively involving themselves in the relevant action. Treating the necessity as not restricted to any particular subject, aligning responsive actions are oriented to when the relevant action will be done, not whether it will be done. We show that such sequences are absent from English interactions by analyzing (a) grammatically similar turn formats in English interaction (“we need to x,” “the x needs to y”), and (b) similar interactive environments in English interactions. We discuss the potential of this research to point to a new avenue for researchers interested in the relationship between language diversity and diversity in human action and cognition.
Within cognitive linguistics, there is an increasing awareness that the study of linguistic phenomena needs to be grounded in usage. Ideally, research in cognitive linguistics should be based on authentic language use, its results should be replicable, and its claims falsifiable. Consequently, more and more studies now turn to corpora as a source of data. While corpus-based methodologies have increased in sophistication, the use of corpus data is also associated with a number of unresolved problems. The study of cognition through off-line linguistic data is, arguably, indirect, even if such data fulfils desirable qualities such as being natural, representative and plentiful. Several topics in this context stand out as particularly pressing issues. This discussion note addresses (1) converging evidence from corpora and experimentation, (2) whether corpora mirror psychological reality, (3) the theoretical value of corpus linguistic studies of ‘alternations’, (4) the relation of corpus linguistics and grammaticality judgments, and, lastly, (5) the nature of explanations in cognitive corpus linguistics. We do not claim to resolve these issues nor to cover all possible angles; instead, we strongly encourage reactions and further discussion.
This study examines what kind of cues and constraints for discourse interpretation can be derived from the logical and generic document structure of complex texts by the example of scientific journal articles. We performed statistical analysis on a corpus of scientific articles annotated on different annotations layers within the framework of XML-based multi-layer annotation. We introduce different discourse segment types that constrain the textual domains in which to identify rhetorical relation spans, and we show how a canonical sequence of text type structure categories is derived from the corpus annotations. Finally, we demonstrate how and which text type structure categories assigned to complex discourse segments of the type “block” statistically constrain the occurrence of rhetorical relation types.
This paper discusses the semi-formal language of mathematics and presents the Naproche CNL, a controlled natural language for mathematical authoring. Proof Representation Structures, an adaptation of Discourse Representation Structures, are used to represent the semantics of texts written in the Naproche CNL. We discuss how the Naproche CNL can be used in formal mathematics, and present our prototypical Naproche system, a computer program for parsing texts in the Naproche CNL and checking the proofs in them for logical correctness.
In her overview, Margret Selting makes the case for the claim that dealing with authentic conversation necessarily lies at the heart of an interactionallinguistic approach to prosody (see Selting this volume, Section 3.3). However, collecting and transcribing corpora of authentic interaction is a time-consuming enterprise. This fact often severely restricts what the individual researcher is able to do in terms of analysis within the scope of his or her resources. Still, for dealing with many of the desiderata Margret Selting points out in Section 5 of her extensive overview, the use of larger corpora seems to be required. In this commenting paper, I want to argue that future progress in research on prosody in interaction will essentially rest on the availability and use of large public corpora. After reviewing arguments for and against the use of public corpora, I will discuss some upshots regarding corpus design and issues of transcription of public corpora.
Different Views on Markup
(2010)
In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.
Consistency of reference structures is an important issue in lexicography and dictionary research, especially with respect to information on sense-related items. In this paper, the systematic challenges of this area (e.g. ‘non-reversed reference’, bidirectional linking being realised as unidirectional structures) will be outlined, and the problems which can be caused by these challenges for both lexicographers and dictionary users will be discussed. The paper also discusses how text-technological Solutions may help to provide Support for the consistency of sense-related pairings during the process of compiling a dictionary.
This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic discourse analysis can be derived. Furthermore, the proposed representations are used as the input sources for discourse parsing. A short overview of the projected parsing architecture is given.
Authors like Fillmore 1986 and Goldberg 2006 have made a strong case for regarding argument omission in English as a lexical and construction-based affordance rather than one based on general semantico-pragmatic constraints. They do not, however, address the question of how grammatical restrictions on null complementation might interact with broader narrative conventions, in particular those of genre. In this paper, we attempt to remedy this oversight by presenting a comprehensive overview of genre-based argument omissions and offering a construction-based analysis of genre-based omission conventions. We consider five genre-based omission types: instructional imperatives (Culy 1996, Bender 1999), labelese, diary style (Haegeman 1990), match reports (Ruppenhofer 2004) and quotative clauses. We show that these omission types share important traits; all, for example, have anaphoric rather than indefinite construals. We also show, however, that the omission types differ from each other in idiosyncratic ways. We then address several interrelated representational problems posed by the grammatical treatment of genre-based omissions. For example, the constructions that represent genre-based omission conventions must interact with the lexical entries of verbs, many of which do not generally permit omitted arguments. Accordingly, we offer constructional analyses of genre-based omissions that allow constructions to override lexical valence constraints.
Introduction
(2010)
Preface
(2010)
As the nature of negative polarity items (NPIs) and their licensing contexts is still under much debate, a broad empirical basis is an important cornerstone to support further insights in this area of research. The work discussed in this paper is intended as a contribution to realizing this objective. The authors briefly introduce the phenomenon of NPIs and outline major theories about their licensing and also various licensing contexts before discussing our major topics: Firstly, a corpus-based retrieval method for NPI candidates is described that ranks the candidates according to their distributional dependence on the licensing contexts. Our method extracts single-word candidates and is extended to also capture multi-word candidates. The basic idea for automatically collecting NPI candidates from a large corpus is that an NPI behaves like a kind of collocate to its licensing contexts. Manual inspection and interpretation of the candidate lists identify the actual NPIs. Secondly, an online repository for NPIs and other items that show distributional idiosyncrasies is presented, which offers an empirical database for further (theoretical) research on these items in a sustainable way.
Antonymy is a relation of lexical opposition which is generally considered to involve (i) the presence of a scale along which a particular property may be graded, and hence both (ii) gradability of the corresponding lexical items and (iii) typical entailment relations. Like other types of lexical opposites, antonyms typically differ only minimally: while denoting opposing poles on the relevant dimension of difference, they are similar with respect to other components of meaning. This paper presents examples of antonymy from the domain of speech act verbs which either lack some of these typical attributes or show problems in the application of these. It discusses several different proposals for the classification of these atypical examples.
The chapter on formats and models for lexicons deals with different available data formats of lexical resources. It elaborates on their structure and possible uses. Motivated by the restrictions in merging different lexical resources based on widely spread formalisms and international standards, a formal lexicon model for lexical resources is developed which is related to graph structures in annotations. For lexicons this model is termed the Lexicon Graph. Within this model the concepts of lexicon entries and lexical structures frequently described in the literature are formally defined and examples are given. The article addresses the problem of ambiguity in those formal terms. An implementation based on XML and XML technology such as XQuery for the defined structures is given. The relation to international standards is included as well.
On the basis of a single case analysis of the emergence of an ethnic joke, this paper explores issues related to laughter in international business meetings. More particularly, it deals with ways in which a person's name is correctly pronounced. Speakers and co-participants seem to orient towards ‘proper’ ways of vocalizing names and to consequent ‘variations’ or ‘deviations’ from them, making different ways of pronunciation available as a laughable. In making such pronunciation variations available, accountable and recognizable, participants reflexively establish as relevant the multilingual character of the activity, of the participants’ competences and of the setting; conversely, they exploit these multilingual features within specific social practices, leading to laughter.
Our analysis focuses on the contexts of action, the sequential environments and the interactional practices by which the uttering of a name becomes a ‘laughable’ and then a resource for an ethnic joke. Moreover, it explores the implications of transforming the pronunciation into a laughable in terms of the organization of the ongoing activity, changing participation frameworks and membership categorizations. In this sense, it highlights the flexible structure of groups and the way in which laughter reconfigures them through local affiliating and disaffiliating moves, and by making various national categories available and relevant.
This chapter will present results of a linguistic landscape (LL) project in the regional centre of Rēzekne in the region of Latgale in Eastern Latvia. Latvia was de facto a part of the Soviet Union until 1991, and this has given it a highly multilingual society. In the essentially post-colonial situation since 1991, strict language policies have been in place, which aim to reverse the language shift from Russian, the dominant language of Soviet times, back to Latvian. Thus, the main interests of the research were how the complex pattern of multilingualism in Latvia is reflected in the LL; how people relate to current language legislation; and what motivations, attitudes and emotions inform their behaviour.
In spite of the obvious importance that is accorded to the notion grammatical construction in any approach that sees itself as a construction grammar (CxG), there is as yet no generally accepted definition of the term across different variants of the framework. In particular, there are different assumptions about which additional requirements a given structure has to meet in order to be recognized as a construction besides being a ‘form-meaning pair’. Since the choice of a particular definition will determine the range of both relevant phenomena and concrete observations to be considered in empirical research within the framework, the issue is not just a mere terminological quibble but has important methodological repercussions especially for quantitative research in areas such as corpus linguistics. The present study illustrates some problems in identifying and delimiting such patterns in naturally occurring text and presents arguments for a usage-based interpretation of the term grammatical construction.
Complex common names such as Indian elephant or green tea denote a certain type of entity, viz. kinds. Moreover, those kinds are always subkinds of the kind denoted by their head noun. Establishing such subkinds is essentially the task of classifying modifiers that are a defining trait of endocentrically structured complex common names. Examining complex common names of different lexico-syntactic types(NN compounds, N+N syntagmas, NP/PP syntagmas, A+N syntagmas) and from different languages (particularly English, German and French) it can be shown that complex common names are subject to language- independent formal and semantic constraints. In particular, complex common names qualify as name-like expressions in that they tend to be deficient in terms of formal complexity and semantic compositionality.
We report on finished work in a project that is concerned with providing methods, tools, best practice guidelines, and solutions for sustainable linguistic resources. The article discusses several general aspects of sustainability and introduces an approach to normalizing corpus data and metadata records. Moreover, the architecture of the sustainability platform implemented by the authors is described.
This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.
The paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation and analysis of multimodality. Each of the tools has specific strengths so that a variety of different tools, working on the same data, can be desirable for project work. However this usually requires tedious conversion between formats. We propose a common exchange format for multimodal annotation, based on the annotation graph (AG) formalism, which is supported by import and export routines in the respective tools. In the current version of this format the common denominator information can be reliably exchanged between the tools, and additional information can be stored in a standardized way.
In this paper we present an evaluation of rule-based morphological components for German for use in an interactive editing environment. The criteria for the evaluation are deduced from the intended use of these components, namely availability, performance, programming interfaces, and analysis quality. We evaluated systems developed and maintained since decades as well as new systems. However, we note serious general shortcomings when looking closer at recent implementations and come to the conclusion that the oldest system is the only one that satisfies our requirements.
Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations
(2009)
In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.
The multiple gradations of German strong verbs are but manifestations of a rather uncomplicated system. There is a small number of ways to make up ablaut forms; these types of formation are identifiable in formal terms and, what is more, they have definite functions as morphological markers. Using classifications of stem forms according to quality, complexity and quantity of vowels, three types of operations involved in ablaut formation are identified. Ablaut always includes a change of quality type or a change of complexity type, and in addition it may include a change of quantity type. Ablaut forms are clearly distinguished as against bases (and against each other): their vocalism meets a defined standard of dissimilarity. On this basis, gradations are collected into inflectional classes that are defined in strictly synchronic terms. These classes continue the historical seven classes known from reference grammars. For the majority of strong verbs, membership in these classes (and thus ablaut) is predictable.
The present study examines the dynamics of the kanji combinations that form common (or general) and proper nouns in Japanese. The following three results were obtained. First, the degree of distribution results from two similar processes which are based on a steady-state of birth-and-death processes with different birth and death rates, resulting in a positive negative binomial distribution with the proper nouns and in a positive Waring distribution with common nouns. Second, all rank-frequency distributions follow the negative hypergeometric distribution used very frequently in ranking problems. Third, the building of kanji compounds follows a dissortative strategy. The higher the outdegree of a kanji, the more it prefers kanji with lower indegrees. A linear dependence can be observed with common nouns, whereas the relationship between compounded kanji is rather curvilinear with proper nouns. The actual analytical expression is not yet known.