Refine
Year of publication
Document Type
- Conference Proceeding (411)
- Part of a Book (332)
- Article (253)
- Book (27)
- Doctoral Thesis (19)
- Working Paper (14)
- Other (7)
- Contribution to a Periodical (6)
- Review (3)
- Part of Periodical (2)
Language
- English (1078) (remove)
Is part of the Bibliography
- no (1078) (remove)
Keywords
- Deutsch (225)
- Korpus <Linguistik> (213)
- Computerlinguistik (118)
- Englisch (78)
- Konversationsanalyse (69)
- Annotation (59)
- Automatische Sprachanalyse (50)
- Interaktion (42)
- Semantik (42)
- Gesprochene Sprache (41)
Publicationstate
- Veröffentlichungsversion (559)
- Postprint (148)
- Zweitveröffentlichung (112)
- Preprint (2)
- Ahead of Print (1)
- Erstveröffentlichung (1)
Reviewstate
- Peer-Review (397)
- (Verlags)-Lektorat (303)
- Qualifikationsarbeit (Dissertation, Habilitationsschrift) (16)
- Peer-review (14)
- Verlags-Lektorat (9)
- Review-Status-unbekannt (5)
- Peer-Revied (3)
- Abschlussarbeit (Bachelor, Master, Diplom, Magister) (Bachelor, Master, Diss.) (2)
- Zweitveröffentlichung (2)
- (Verlags-)Lektorat (1)
Publisher
- de Gruyter (60)
- Benjamins (50)
- IDS-Verlag (47)
- Springer (42)
- Association for Computational Linguistics (35)
- European Language Resources Association (ELRA) (32)
- Institut für Deutsche Sprache (24)
- European Language Resources Association (22)
- Elsevier (21)
- Niemeyer (20)
Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage; its poetic genres impose unique combinatory constraints on linguistic elements. How does the constrained poetic structure facilitate speech segmentation when common linguistic and statistical cues are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language.
The project Referenzkorpus Altdeutsch (‘Old German Reference Corpus’) aims to es- tablish a deeply-annotated text corpus of all extant Old German texts. As the automated part-of-speech and morphological pre-annotation is amended by hand, a quality control system for the results seems a desirable objective. To this end, standardized inflectional forms, generated using the morphological information, are compared with the attested word forms. Their creation is described by way of example for the Old High German part of the corpus. As is shown, in a few cases, some features of the attested word forms are also required in order to determine as exactly as possible the shape of the inflected lemma form to be created.
The availability of electronic corpora of historical stages of languages has been wel- comed as possibly attenuating the inherent problem of diachronic linguistics, i.e. that we only have access to what has chanced to come down to us - the problem which was memorably named by Labov (1992) as one of “Bad Data”. However, such corpora can only give us access to an increased amount ot historical material and this can essentially still only be a partial and possibly distorted picture of the actual language at a particular period of history. Corpora can be improved by taking a more representative sample of extant texts if these are available (as they are in significant number for periods after the invention of printing). But, as examples from the recently compiled GerManC corpus of seventeenth and eighteenth century German show, the evidence from such corpora can still fail to yield definitive answers to our questions about earlier stages of a language. The data still require expert interpretation, and it is important to be realistic about what can legitimately be expected from an electronic historical corpus.
Multi-faceted alignment. Toward automatic detection of textual similarity in Gospel-derived texts
(2015)
Ancient Germanic Bible-derived texts stand in as test material for producing computational means for automatically determining where textual contamination and linguistic interference have influenced the translation process. This paper reports on the results of research efforts that produced a text corpus; a method for decomposing the texts involved into smaller, more directly comparable thematically-related chunks; a database of relationships between these chunks; and a user-interface allowing for searches based on various referential criteria. Finally, the state of the product at the end of the project is discussed, namely as it was handed over to another researcher who has extended it to automatically find semantic and syntactic similarities within comparable chunks.
In this paper we present some preliminary considerations concerning the possibility of automatic parsing an annotated corpus for N-N compounds. This should in prin- ciple be possible at least for relational and stereotype compounds, if the lemmatization of the corpus connects the lemmata with lexical entries as described in Höhle (1982). These lexical entries then supply the necessary information about the argument structure of a relational noun or about the stereotypical purpose associated with the noun’s referent which can be used to establish a relation between the first and the head constituent of the compound.
The relative order of dative and accusative objects in older German is less free than it is today. The reason for this could be that speakers of the direct predecessor of Old High German organized the referents according to the Thematic Hierarchy. If one applies a Case Hierarchy Nom>Acc>Dat to this, the order Nom - Dat - Acc falls out. It becomes apparent that the status of the Thematic Hierarchy is not a factor governing underlying word order, but a factor inducing scrambling. Arguments from binding theory, whose validity is discussed, indicate that the underlying order is ‘accusative before dative’
Studying the role of expertise in poetry reading, we hypothesized that poets’ expert knowledge comprises genre-appropriate reading- and comprehension strategies that are reflected in distinct patterns of reading behavior.
We recorded eye movements while two groups of native speakers (n=10 each) read selected Russian poetry: an expert group of professional poets who read poetry daily, and a control group of novices who read poetry less than once a month. We conducted mixed-effects regression analyses to test for effects of group on first-fixation durations, first-pass gaze durations, and total reading times per word while controlling for lexical- and text variables.
First-fixation durations exclusively reflected lexical features, and total reading times reflected both lexical- and text variables; only first-pass gaze durations were additionally modulated by readers’ level of expertise. Whereas gaze durations of novice readers became faster as they progressed through the poems, and differed between line-final words and non-final ones, poets retained a steady pace of first-pass reading throughout the poems and within verse lines. Additionally, poets’ gaze durations were less sensitive to word length.
We conclude that readers’ level of expertise modulates the way they read poetry. Our findings support theories of literary comprehension that assume distinct processing modes which emerge from prior experience with literary texts.
We report results from an exploratory study of college students’ conceptions of poetry in which we asked them to name three things they expect from a poem. Frequency- and list-based analyses of their responses revealed that they primarily expect poems to rhyme, but they also identified a number of form-, content-, and reception-related genre expectations, which we discuss in relation to relevant previous research. We propose that rhyme’s predominance in college students’ genre expectations reflects its perceptual and cognitive salience during incremental poetry comprehension rather than its frequency in contemporary poetic practice. Our results characterize the genre conceptions of the population that empirical studies of poetry comprehension typically investigate, and thus provide relevant background information for the interpretation of empirical
findings in this field.
We examined genre-specific reading strategies for literary texts and hypothesized that text categorization (literary prose vs. poetry) modulates both how readers gather information from a text (eye movements) and how they realize its phonetic surface form (speech production). We recorded eye movements and speech while college students (N = 32) orally read identical texts that we categorized and formatted as either literary prose or poetry. We further varied the text position of critical regions (text-initial vs. text-medial) to compare how identical information is read and articulated with and without context; this allowed us to assess whether genre-specific reading strategies make differential use of identical context information. We observed genre-dependent differences in reading and speaking tempo that reflected several aspects of reading and articulation. Analyses of regions of interests revealed that word-skipping increased particularly while readers progressed through the texts in the prose condition; speech rhythm was more pronounced in the poetry condition irrespective of the text position. Our results characterize strategic poetry and prose reading, indicate that adjustments of reading behavior partly reflect differences in phonetic surface form, and shed light onto the dynamics of genre-specific literary reading. They generally support a theory of literary comprehension that assumes distinct literary processing modes and incorporates text categorization as an initial processing step.
Latvia
(2019)
This chapter deals with current issues in bilingual education in the framework of language and educational policies in Latvia, and also outlines similarities or common tendencies in the two other Baltic states, Estonia and Lithuania. As commonly understood in the 21st century, the term ‘bilingual education’ includes ‘multilingual education, as the umbrella term to cover a wide spectrum of practice and policy’ (García, 2009: 9).
This special issue of the Journal on Ethnopolitics and Minority Issues in Europe (JEMIE) brings together some of the participants of the symposium Political and Economic Resources and Obstacles of Minority Language Maintenance organized by the Language Survival Network ‘POGA’ at Tallinn University, Estonia, in December 2010. More than 20 scholars representing linguistics, anthropology, social sciences and law participated in the symposium, to present papers and discuss questions related to minority language loss, maintenance and revitalization. The six case studies contained in this special issue look at different minorities and regions in the European Union, Russia and the US. The linguistic communities discussed are the Russian-, Võru/Seto- and Latgalian-speaking minorities of Estonia and Latvia; the Welsh- and Breton-speaking communities of the Celtic language; the Russian Finno-Ugrian people with regional autonomies; and the native American groups of the Delaware/Cherokee and the Oneida. The reader will find articles relating to interdisciplinary research approaches in and on minority languages and minority language communities.
When we first started the project of looking at minority languages through a linguistic landscape lens, we felt that the visibility of minority languages in public space had been insufficiently dealt with in traditional minority language research. A linguistic landscape approach, as it had developed over the last years, would constitute a valuable path to explore, by looking at the ‘same old issues’ of language contact and language conflict from a specific angle. We were convinced that fresh linguistic landscape data would be able to provide innovative and useful insights into ‘patterns of language […] use, official language policies, prevalent language attitudes, [and] power relations between different linguistic groups’ (Backhaus 2007, p. 11). The linguistic landscape approach, as presented by the different authors in this volume, has clearly proven to be a heuristic appropriate and relevant for a wide range of minority language situations. More specifically, the ideas and analyses in the different chapters do contribute to a further understanding of minority languages and their speakers. They deepen our comprehension of language policies, power relations and ideologies in minority language settings.
Our paper discusses family language policies among multilingual families in Latvia with Russian as home language. The presentation is based on three case studies, i.e. interviews conducted with Russophones who have chosen to send their children to Latvian-medium pre-schools and schools. The main aim is to understand practices and regards among such families “from below,” i.e. which family-internal and family-external factors influenced the choice of Latvian-medium education and what impact this choice has on linguistic practices.
The paper shows that there have been critical events which both encouraged and discouraged the choice of Latvian-medium education. The wish to integrate into mainstream society has been met by obstacles both from ethnic Russians and Latvians. Yet, the three families consider their choices to be the right ones for the future development of their children in a multiethnic Latvia in which Latvian serves as the unifying language of society.
The present paper examines the rise and fall of Modern High German loanwords in English from 1600 until 2000, principally making use of the record of borrowing documented by the Oxford English Dictionary (OED) in its Third Edition (online version, in revision 2000-). Groups of loanwords are analysed by century, with reference to the changing social and cultural landscape characterising relationships between the relevant nations over this period. This is not a simple picture: each language grows over the period in different ways, and the speakers of English look to German at different times for different types of borrowing, as the political and intellectual balance alters.
This study explores the interdependence of qualitative and quantitative analysis in articulating empirically plausible and theoretically coherent generalizations about grammatical structure. I will show that the use of large electronic corpora is indispensable to the grammarian's work, serving as a rich source of semantic and contextual information, which turns out to be crucial in categorizing and explaining grammatical forms. These general concerns are illustrated by the patterns of use of Czech relative clauses (RC) with the non-declinable relativizer co, by taking a set of existing claims about these RCs and testing their accuracy on corpus material. The relevant analytic categories revolve around the referential type of the relativized noun, the interaction between relativization and deixis, and the semantic relationship between the relativized noun and the proposition expressed by the RC. The analysis demonstrates that some of the existing claims are fully invalid in the face of regularly attested semantic distinctions, while others are more or less on the right track but often not comprehensive or precise enough to capture the full richness of the facts. 1
Conversation is usually considered to be grammatically simple, while academic writing is often claimed to be structurally complex, associated primarily with a greater use of dependent clauses. Our goal in the present paper is to challenge these stereotypes, based on the results of large-scale corpus investigations. We argue that both conversation and professional academic writing are grammatically complex but that their complexities are dramatically different. Surprisingly, the traditional view that complexity is realized through extensive clausal embedding leads to the conclusion that conversation is more complex than academic writing. In contrast, written academic discourse is actually much more ‘compressed’ than elaborated, and the complexities of academic writing are realized mostly as phrasal embedding rather than embedded clauses.
This paper first argues that the distinction between Propositions and States-of-Affairs is significant for understanding a number of linguistic contrasts, including contrasts between nominalizations, complement clauses, readings of modal infinitives, raising constructions, illocutions and moods, relative clauses, and nouns. Subsequently, the paper outlines a cognitive linguistic model of the distinction, according to which Propositions and States-of-Affairs differ in terms of construal. Both prompt Langackerian “processes”, but only Propositions prompt a construal of these processes as referential. The paper argues that this model has a number of advantages over a traditional, denotational understanding of the distinction.
The present article proposes a syntactic and semantic analysis of assertive clauses that comprises their truth-conditional aspects and their speech act potential in communication. What is commonly called “illocutionary force” is differentiated into three structurally and functionally distinct layers: a judgement phrase, representing subjective epistemic and evidential attitudes; a commitment phrase, representing the social commitment related to assertions; and an act phrase, representing the relation to the common ground of the conversation. The article provides several pieces of evidence for this structure: from the interpretation and syntactic position of various classes of epistemic, evidential, affirmative and speech act-related operators, from clausal complements embedded by different types of predicates, from embedded root clauses, and from anaphora referring to different clausal projections. The syntactic assumptions are phrased within X-bar theory, and the semantic interpretation makes use of dynamic update of common ground, differentiating between informative and performative updates. The object language is German, with particular reference to verb final and verb second structure.
This article describes an English Zulu learners’ dictionary that is part of a larger set of information tools, namely an online Zulu course, an e-dictionary of possessives (which was implemented earlier) accompanied by training software offering translation tasks on several levels, and an ontology of morphemic items categorizing and describing all parts of speech of Zulu. The underlying lexicographic database contains the usual type of lexicographic data, such as translation equivalents and their respective morphosyntactic data, but its entries have been extended with data related to the lessons of the online course in order to enable the learner to link both tools autonomously. The ‘outer matter’ is integrated into the website in the form of several texts on additional web pages (how-to-use, typical outputs, grammar tables, information on morphosyntactic rules, etc.). The dictionary comprises a modular system, where each module fulfils one of the necessary functions.
This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.
Corpus-based identification and disambiguation of reading indicators for German nominalizations
(2010)
Corpus data is often structurally and lexically ambiguous; corpus extraction methodologies thus must be made aware of ambiguities. Therefore, given an extraction task, all relevant ambiguities must be identified. To resolve these ambiguities, contextual data responsible for one or another reading is to be considered. In the context of our present work, German -ung-nominalizations and their sortal readings are under examination. A number of these nominalizations may be read as an event or a result, depending on the semantic group they belong to. Here, we concentrate on nominalizations of verbs of saying (henceforth: "verba dicendi"), identify their context partners and their influence on the sortal reading of the nominalizations in question. We present a tool which calculates the sortal reading of such nominalizations and thus may improve not only corpus extraction, but also e.g. machine translation. Lastly, we describe successful attempts to identify the correct sortal reading, conclusions and future work.
In the context of a Nordic Conference on Bilingualism, it can be a rewarding task to look at issues such as language planning, policy and legislation from a perspective of the southern neighbours of the Nordic world. This paper therefore intends to point attention towards a case of societal multilingualism at the periphery of the Nordic world by dealing with recent developments in language policy and legislation with regard to the North Frisian speech community in the German Land of Schleswig-Holstein. As I will show, it is striking to what degree there are considerable differences in the discourse on minority protection and language legislation between the Nordic countries and a cultural area which may arguably be considered to be part of the Nordic fringe - and which itself occasionally takes Scandinavia as a reference point, e.g. in the recent adoption of a pan-Frisian flag modelled on the Nordic cross (Falkena 2006).
The main focus of the paper will be on the Frisian Act which was passed in the Parliament of Schleswig-Holstein in late 2004. It provides a certain legal basis for some political activities with regard to Frisian, but falls short of creating a true spirit of minority language protection and/or revitalisation. In contrast to the traditions of the German and Danish minorities along the German-Danish border and to minority protection in Northern Scandinavia (in particular to Sámi language rights), the approach chosen in the Frisian Act is extremely weak and has no connotation of long-term oriented language-planning, let alone a rights-based perspective.
The paper will then look at policy developments in the time since the Act was passed, e.g. in the Schleswig-Holstein election campaign in 2005, and on latest perceptions of the Frisian language situation in the discourse on North Frisian Policy in Schleswig-Holstein majority society. In the final part of the paper, I will discuss reasons for the differences in minority language policy discourse between Germany and the Nordic countries, and try to provide an outlook on how Frisian could benefit from its geographic proximity to the Nordic world.
„Unserdeutsch”, a creole spoken in a former German South Pacific colony, and what is now Papua New Guinea, is being extensively documented and studied by linguists for the first time. There is no time to lose, because after a chequered history the world's only German-based creole – long ignored – is facing extinction.
In this paper, we discuss to what extent the German-based contact language Unserdeutsch (Rabaul Creole German, cf. Volker 1982) matches the category‘creole language’ from both a socio-historical and structural perspective. As a point of reference, we will use typological criteria that are widely supposed to be typical for creole languages. It is shown that Unserdeutsch fits fairly well into the pattern of an ‘average creole’, as has been suggested by data in the Atlas of Pidgin and Creole Language Structures (Michaelis et al. 2013). This is despite a series of atypical conditions in its development that might lead us to expect a close structural proximity to the lexifier language, i.e. a relatively acrolectal creole. A possible explanation for this striking discrepancy can be found in the primary function of Unserdeutsch as a marker of identity as well as in the linguistic structure of its substrate language Tok Pisin.
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years. The main conference papers deal with this topic from different points of view, involving flat as well as deep representations, automatic methods targeting annotation and hybrid symbolic and statistical processing, as well as new Machine Learning-based approaches, but also the creation of language resources for both machines and humans, and methods for testing the latter to optimize their human-machine interaction properties. In line with the general topic, KONVENS-2014 focuses on areas of research which involve this cooperation of information science and computational linguistics: for example learning-based approaches, (cross-lingual) Information Retrieval, Sentiment Analysis, paraphrasing or dictionary and corpus creation, management and usability.
Communicative deviations of respondents in political video interviews in Ukrainian and German
(2021)
The research has the objective to establish the peculiarities of communicative deviations as a cognitive and at the same time discursive phenomenon in Ukrainian- and German-language video interviews from the viewpoint of respondents. The procedure of the research involves the integrated application of methods and techniques of pragmatics, deviatology and communicative linguistics. A new methodological basis has been developed for the reconstruction of communicative deviations using discourse analysis, namely for the reconstruction of a single event in two discursive environments, determining the communicative context and communication of interview in compared languages. The results of the research allow us to identify the features of communicative deviations in political interviews at the external, internal structural levels and at the situational level. The conclusions of the research indicate that the types of communicative deviations in political video interviews are universal in Ukrainian and German, but reflect national and cultural specifics given the peculiarities of both languages and each linguoculture, as well as existing realias, norms, conventions, maxims and rules of communication.
The article analyzes communicative deviations that occur during the communication between German native speakers and non-native speakers, particularly Ukrainians. Despite existing intercultural and sociolinguistic studies, the analysis of language specificity that causes communicative deviations, failures and misunderstandings remains relevant and understudied. The purpose of this article is to identify and explore the German language peculiarities that cause misunderstandings in communication for non-native speakers, in particular Ukrainian speakers, and offer the algorithm for the representatives of different ethnic communities to help them avoid and resolve possible conflicts given the study of German as a foreign language. The status of the concept of communicative deviation in intercultural communication under conditions of insufficient communicative competence is determined in this article. The study uses the term communicative deviation in favor of a generalized term, a broad concept of linguistic, speech and communicative deviations in dialogic speech, in particular between native German speakers and non-native speakers. The empirical research was based on the speech activity of Ukrainian students during classes at the Department of German Studies and Translation (levels B2–C1) of Ivan Franko National University of Lviv in 2019–2021 academic years and definitions from the Universal Dictionary of German Duden, in addition to the materials reflected in textbooks and teaching manuals as well as from authentic German-language sources. Communicative deviations are identified and analyzed in phonological, lexical, syntactic and pragmatic aspects.
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years.
This chapter explores the Linguistic Landscape of six medium-size towns in the Baltic States with regard to languages of tourism and to the role of English and Russian as linguae francae. A quantitative analysis of signs and of tourism web sites shows that, next to the state languages, English is the most dominant language. Yet, interviews reveal that underneath the surface, Russian still stands strong. Therefore, possible claims that English might take over the role of the main lingua franca in the Baltic States cannot be maintained. English has a strong position for attracting international tourists, but only alongside Russian which remains important both as a language of international communication and for local needs.
This article looks at Latgalian from a perspective of a classification of languages. It starts by discussing relevant terms relating to sociolinguistic language types. It argues that Latgalian and its speakers show considerable similarities with many languages in Europe which are considered to be regional languages – hence, also Latgalian should be classified as such. In a second part, the article uses sociolinguistic data to indicate that the perceptions of speakers confirm this classification. Therefore, Latgalian should also officially be treated with the respect that other regional languages in Europe enjoy.
This paper provides insights into the ongoing international research project Unserdeutsch (Rabaul Creole German): Documentation of a highly endangered creole language in Papua New Guinea, based at the University of Augsburg, Germany. It elaborates on the different stages of the project, ranging from fieldwork to corpus development, thereby outlining the methods and software background used for the intended purposes. In doing so, we also give some approaches to solving specific problems, which have arisen in the course of practical work until now.
This study examines asymmetries between so-called inherent and contextual categories in relation to the morphological complexity of the nominal and verbal inflectional domain of languages. The observations are traced back to the influence of adult L2 learning in scenarios of intense language contact. A method for a simple comparison of the amount of inherent versus contextual categories is proposed and applied to the German-based creole language Unserdeutsch (Rabaul Creole German) in comparison to its lexifier language. The same procedure will be applied to two further language pairs. The grammatical systems of Unserdeutsch and other contact languages display a noticeable asymmetry regarding their structural complexity. Analysing different kinds of evidence, the explanatory key factor seems to be the role of (adult) L2 acquisition in the history of a language, whereby languages with periods of widespread L2 acquisition tend to lose contextual features. This impression is reinforced by general tendencies in pidgin and creole languages. Beyond that, there seems to be a tendency for inherent categories to be more strongly associated with the verb, while contextual categories seem to be more strongly associated with the noun. This leads to an asymmetry in categorical complexity between the noun phrase and the verb phrase in languages that experienced periods of intense L2 learning.
Basic grammatical categories may carry social meanings irrespective of their semantic content. In a set of four studies, we demonstrate that verbs—a basic linguistic category present and distinguishable in most languages—are related to the perception of agency, a fundamental dimension of social perception. In an archival analysis of actual language use in Polish and German, we found that targets stereotypically associated with high agency (men and young people) are presented in the immediate neighborhood of a verb more often than non-agentic social targets (women and older people). Moreover, in three experiments using a pseudo-word paradigm, verbs (but not adjectives and nouns) were consistently associated with agency (but not with communion). These results provide consistent evidence that verbs, as grammatical vehicles of action, are linguistic markers of agency. In demonstrating meta-semantic effects of language, these studies corroborate the view of language as a social tool and an integral part of social perception.
Nonnative accents are prevalent in our globalized world and constitute highly salient cues in social perception. Whereas previous literature has commonly assumed that they cue specific social group stereotypes, we propose that nonnative accents generally trigger spontaneous negatively biased associations (due to a general nonnative accent category and perceptual influences). Accordingly, Study 1 demonstrates negative biases with conceptual IATs, targeting the general concepts of accent versus native speech, on the dimensions affect, trust, and competence, but not on sociability. Study 2 attests to negative, largely enhanced biases on all dimensions with auditory IATs comprising matched native–nonnative speaker pairs for four accent types. Biases emerged irrespective of the accent types that differed in attractiveness, recognizability of origin, and origin-linked national associations. Study 3 replicates general IAT biases with an affect IAT and a conventional evaluative IAT. These findings corroborate our hypotheses and assist in understanding general negativity toward nonnative accents.
In a previous article (Faaß et al., 2012), a first attempt was made at documenting and encoding morphemic units of two South African Bantu languages, i.e. Northern Sotho and Zulu, with the aim of describing and storing the morphemic units of these two languages in a single relational database, structured as a hierarchical ontology. As a follow-up, the current article describes the implementation of our part-of-speech ontology. We give a detailed description of the morphemes and categories contained in the database, highlighting the need and reasons for a flexible ontology which will provide for both language specific and general linguistic information. By giving a detailed account of the methodology for the population of the database, we provide linguists from other Bantu languages with a road map for extending the database to also include their languages of specialization.
Towards a part-of-speech ontology: encoding morphemic units of two South African Bantu languages
(2012)
This article describes the design of an electronic knowledge base, namely a morpho-syntactic database structured as an ontology of linguistic categories, containing linguistic units of two related languages of the South African Bantu group: Northern Sotho and Zulu. These languages differ significantly in their surface orthographies, but are very similar on the lexical and sub-lexical levels. It is therefore our goal to describe the morphemes of these languages in a single common database in order to outline and interpret commonalities and differences in more detail. Moreover, the relational database which is developed defines the underlying morphemic units (morphs) for both languages. It will be shown that the electronic part-of-speech ontology goes hand in hand with part-of-speech tagsets that label morphemic units. This database is designed as part of a forthcoming system providing lexicographic and linguistic knowledge on the official South African Bantu languages.
Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world.
So far, comprehensive grammar descriptions of Northern Sotho have only been available in the form of prescriptive books aiming at teaching the language. This paper describes parts of the first morpho-syntactic description of Northern Sotho from a computational perspective (Faaß, 2010a). Such a description is necessary for implementing rule based, operational grammars. It is also essential for the annotation of training data to be utilised by statistical parsers. The work that we partially present here may hence provide a resource for computational processing of the language in order to proceed with producing linguistic representations beyond tagging, may it be chunking or parsing. The paper begins with describing significant Northern Sotho verbal morpho-syntactics (section 2). It is shown that the topology of the verb can be depicted as a slot system which may form the basis for computational processing (section 3). Note that the implementation of the described rules (section 4) and also coverage tests are ongoing processes upon that we will report in more detail at a later stage.
Communication of stereotypes in the classroom: biased language use of German and Turkish adolescents
(2014)
Little is known about the linguistic transmission and maintenance of mutual stereotypes in interethnic contexts. This field study, therefore, investigated the linguistic expectancy bias (LEB) and the linguistic intergroup bias (LIB) among German and Turkish adolescents (13 to 20 years) in the school context. The LEB refers to the general phenomenon of describing stereotypes more abstractly. The LIB is the tendency to use language abstraction for in-group protective reasons. Results revealed an unmoderated LEB, whereas the LIB only occurred when foreigners were in the numerical majority, the classroom composition was perceived as a learning disadvantage, or the interethnic conflict frequency was high. These findings provide first evidence for the use of both LEB and LIB in an interethnic classroom setting.
This chapter will present results of a linguistic landscape (LL) project in the regional centre of Rēzekne in the region of Latgale in Eastern Latvia. Latvia was de facto a part of the Soviet Union until 1991, and this has given it a highly multilingual society. In the essentially post-colonial situation since 1991, strict language policies have been in place, which aim to reverse the language shift from Russian, the dominant language of Soviet times, back to Latvian. Thus, the main interests of the research were how the complex pattern of multilingualism in Latvia is reflected in the LL; how people relate to current language legislation; and what motivations, attitudes and emotions inform their behaviour.
Sexual harassment severely impacts the educational system in the West African country Benin and the progress of women in this society that is characterized by great gender inequality. Knowledge of the belief systems rooting in the sociocultural context is crucial to the understanding of sexual harassment. However, no study has yet investigated how sexual harassment is related to fundamental beliefs in Benin or West African countries. We conducted a field study on 265 female and male students from several high schools in Benin to investigate the link between sexual harassment and measures of ambivalent sexism, gender identity, and rape myth acceptance. Almost half of the sample reported having experienced sexual harassment personally or among peers. Levels of sexism and rape myth acceptance were very high compared to other studies. These attitudes appeared to converge in a sexist belief system that was linked to personal experiences, the perceived probability of experiencing and fear of sexual harassment. Results suggest that sexual harassment is a societal problem and that interventions need to address fundamental attitudes held in societies low in gender equality.
So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.
The prohibitive is typically defined as the negative imperative, i.e. it “implies making someone not do something, having the effect of forbidding, preventing, or restricting” (Aikhenvald, 2017: 3). This chapter focuses on the formation of the prohibitive in the languages of Daghestan and neighboring regions, analyzing two different aspects of the morphological coding: first, the verb form (especially whether it is an imperative form or not), and second, the type of negation marker/affix used. Based on this, the general encoding types are deduced. Additionally, the phonological form of the markers is shortly analyzed.
We report on a new project building a Natural Language Processing resource for Zulu by making use of resources already available. Combining tagging results with the results of morphological analysis semi-automatically, we expect to reduce the amount of manual work when generating a finely-grained gold standard corpus usable for training a tagger. From the tagged corpus, we plan to extract verb-argument pairs with the aim of compiling a verb valency lexicon for Zulu.
So far, Sepedi negations have been considered more from the point of view of lexicographical treatment. Theoretical works on Sepedi have been used for this purpose, setting as an objective a neat description of these negations in a (paper) dictionary. This paper is from a different perspective: instead of theoretical works, corpus linguistic methods are used: (1) a Sepedi corpus is examined on the basis of existing descriptions of the occurrences of a relevant verb, looking at its negated forms from a purely prescriptive point of view; (2) a "corpus-driven" strategy is employed, looking only for sequences of negation particles (or morphemes) in order to list occurring constructions, without taking into account the verbs occurring in them, apart from their endings. The approach in (2) is only intended to show a possible methodology to extend existing theories on occurring negations. We would also like to try to help lexicographers to establish a frequency-based order of entries of possible negation forms in their dictionaries by showing them the number of respective occurrences. As with all corpus linguistic work, however, we must regard corpus evidence not as representative, but as tendencies of language use that can be detected and described. This is especially true for Sepedi, for which only few and small corpora exist. This paper also describes the resources and tools used to create the necessary corpus and also how it was annotated with part of speech and lemmas. Exploring the quality of available Sepedi part-of-speech taggers concerning verbs, negation morphemes and subject concords may be a positive side result.
This paper describes the application of probabilistic part of speech taggers to the Dzongkha language. A tag set containing 66 tags is designed, which is based on the Penn Treebank. A training corpus of 40,247 tokens is utilized to train the model. Using the lexicon extracted from the training corpus and lexicon from the available word list, we used two statistical taggers for comparison reasons. The best result achieved was 93.1% accuracy in a 10-fold cross validation on the training set. The winning tagger was thereafter applied to annotate a 570,247 token corpus.
An interactive, dynamic electronic dictionary aimed at text production should guide the user in innovative ways, especially in respect of difficult, complicated or confusing issues. This paper proposes a design for bilingual dictionaries intended to guide users in text production; we focus on complex phenomena of the interaction between lexis and grammar. It will be argued that a dictionary aimed at guiding the user in lexical selection should implement a type of “decision algorithm”. In addition, it should flag incorrect solutions and should warn against possible wrong generalisations of (foreign) language learners. Our proposals will be illustrated with examples from several languages, as the design principles are generally applicable. The copulative construction which is regarded as the most complicated grammatical structure in Northern Sotho will be analyzed in more detail and presented as a case in point.
This paper reports about current practice in a staged approach to the introduction of NLP principles and techniques for students of information science (IIM) and of international communication and translation (ICT) as part of their curricula. As most of these students are rather not familiar with computer science or, in the case of IIM students, linguistics, we see them as comparable with students of the humanities. We follow a blended learning strategy with lectures, online materials, tutorials, and screencasts. In the first two terms, we focus on linguistics and its formalisation, NLP tools and applications are then introduced from the third term on. The lectures are combined with tutorials and - since the summer term 2017 - with a set of screencasts.