Refine
Year of publication
- 2014 (56) (remove)
Document Type
- Conference Proceeding (27)
- Article (13)
- Part of a Book (13)
- Book (3)
Language
- English (56) (remove)
Is part of the Bibliography
- no (56) (remove)
Keywords
- Korpus <Linguistik> (16)
- Computerlinguistik (9)
- Deutsch (9)
- Natürliche Sprache (6)
- Gesprochene Sprache (5)
- Information Extraction (5)
- Annotation (4)
- Automatische Sprachanalyse (4)
- Englisch (4)
- German (4)
Publicationstate
- Veröffentlichungsversion (29)
- Postprint (9)
- Zweitveröffentlichung (6)
Reviewstate
- Peer-Review (24)
- (Verlags)-Lektorat (13)
- Verlags-Lektorat (2)
Publisher
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years. The main conference papers deal with this topic from different points of view, involving flat as well as deep representations, automatic methods targeting annotation and hybrid symbolic and statistical processing, as well as new Machine Learning-based approaches, but also the creation of language resources for both machines and humans, and methods for testing the latter to optimize their human-machine interaction properties. In line with the general topic, KONVENS-2014 focuses on areas of research which involve this cooperation of information science and computational linguistics: for example learning-based approaches, (cross-lingual) Information Retrieval, Sentiment Analysis, paraphrasing or dictionary and corpus creation, management and usability.
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years.
Communication of stereotypes in the classroom: biased language use of German and Turkish adolescents
(2014)
Little is known about the linguistic transmission and maintenance of mutual stereotypes in interethnic contexts. This field study, therefore, investigated the linguistic expectancy bias (LEB) and the linguistic intergroup bias (LIB) among German and Turkish adolescents (13 to 20 years) in the school context. The LEB refers to the general phenomenon of describing stereotypes more abstractly. The LIB is the tendency to use language abstraction for in-group protective reasons. Results revealed an unmoderated LEB, whereas the LIB only occurred when foreigners were in the numerical majority, the classroom composition was perceived as a learning disadvantage, or the interethnic conflict frequency was high. These findings provide first evidence for the use of both LEB and LIB in an interethnic classroom setting.
So far, there have been few descriptions on creating structures capable of storing lexicographic data, ISO 24613:2008 being one of the latest. Another one is by Spohr (2012), who designs a multifunctional lexical resource which is able to store data of different types of dictionaries in a user-oriented way. Technically, his design is based on the principle of a hierarchical XML/OWL (eXtensible Markup Language/Web Ontology Language) representation model. This article follows another route in describing a model based on entities and relations between them; MySQL (usually referred to as: Structured Query Language) describes a database system of tables containing data and definitions of relations between them. The model was developed in the context of the project "Scientific eLexicography for Africa" and the lexicographic database to be built thereof will be implemented with MySQL. The principles of the ISO model and of Spohr's model are adhered to with one major difference in the implementation strategy: we do not place the lemma in the centre of attention, but the sense description — all other elements, including the lemma, depend on the sense description. This article also describes the contained lexicographic data sets and how they have been collected from different sources. As our aim is to compile several prototypical internet dictionaries (a monolingual Northern Sotho dictionary, a bilingual learners' Xhosa–English dictionary and a bilingual Zulu–English dictionary), we describe the necessary microstructural elements for each of them and which principles we adhere to when designing different ways of accessing them. We plan to make the model and the (empty) database with all graphical user interfaces that have been developed, freely available by mid-2015.
This paper describes a first version of an integrated e-dictionary translating possessive constructions from English to Zulu. Zulu possessive constructions are difficult to learn for non-mother tongue speakers. When translating from English into Zulu, a speaker needs to be acquainted with the nominal classification of nouns indicating possession and possessor. Furthermore, (s)he needs to be informed about the morpho-syntactic rules associated with certain combinations of noun classes. Lastly, knowledge of morpho-phonetic changes is also required, because these influence the orthography of the output word forms. Our approach is a novel one in that we combine e-lexicography and natural language processing by developing a (web) interface supporting learners, as well as other users of the dictionary to produce Zulu possessive constructions. The final dictionary that we intend to develop will contain several thousand nouns which users can combine as they wish. It will also translate single words and frequently used multiword expressions, and allow users to test their own translations. On request, information about the morpho-syntactic and morpho-phonetic rules applied by the system are displayed together with the translation. Our approach follows the function theory: the dictionary supports users in text production, at the same time fulfilling a cognitive function.
This paper seeks to apply the principles of the famous 3-Circle-Model devised for the description of the ecolinguistic position of English world-wide to the position of German around the world.
On the one hand, the 3-Circle-Model for English with its "Inner", "Outer" and "Extended/Expanding" Circles was invented by Kachru in the 1980s and has since then been adopted, refined and criticised by numerous authors. The situation of German world-wide, on the other hand, has only been scarcely discussed in the past 20 years. While the global extension of German is obviously by far weaker than that of English, there are also a number of noteworthy similarities in terms of historical spread and the current position of these two languages.
This paper therefore discusses the analogies of global English and German by establishing three circles for German: the Inner Circle for the core German-speaking area, i.e. Germany, Austria and Switzerland; the Outer Circle including a number of German minority areas (mostly in Europe), and finally the Extended Circle which may be denoted as "Crumbling" rather than "Expanding". The latter comprises traditional German diaspora communities in different parts of the world which either result from migration, but also reflect the previous functions of German as a language of culture and as a lingua franca in regions like Eastern Europe. The paper argues that there are some striking structural similarities, but also shows the limits of this comparison.
This chapter focuses on the way in which co-present parties in meetings manage language choice and treat it as raising problems of participation - in the sense that participants can orient to the fact that a given language choice may increase or diminish participation for some or all co-present group members. Choosing one language rather than another is approached here as a members' problem (in an ethnomethodological sense), and as a decision the participants make themselves, in situ and within their courses of action, displaying the way in which they orient to its local consequences, and how they justify and legitimize it. In order to explore this link between language choice and participation systematically, in this chapter we focus on a particular and recurrent phenomenon, the announcement of a language change. Within the conversation analysis framework, we analyse these announcements by taking into account the sequential position in which they occur, their format, the way in which they are addressed to a sub-group or to the group as a whole, and the specific action they accomplish. We will also look at how the group receives the announcement, its effects on the participation framework, as well as the categorizations that ensue from it. This chapter therefore highlights the mutual configuration between language choice and participation framework. Our analyses are based on several video- and audio-recorded corpora of international work meetings. These video data call for reflection not only on the linguistic dimension of participation frameworks and language switches, but more broadly on their multimodal organization. This chapter shows that multimodal details are crucial if we aim to understand the relation between multilingualism and participation as occasioned, contingent and emergent dynamics.
Measuring the quality of metadata is only possible by assessing the quality of the underlying schema and the metadata instance. We propose some factors that are measurable automatically for metadata according to the CMD framework, taking into account the variability of schemas that can be defined in this framework. The factors include among others the number of elements, the (re-)use of reusable components, the number of filled in elements. The resulting score can serve as an indicator of the overall quality of the CMD instance, used for feedback to metadata providers or to provide an overview of the overall quality of metadata within a repository. The score is independent of specific schemas and generalizable. An overall assessment of harvested metadata is provided in form of statistical summaries and the distribution, based on a corpus of harvested metadata. The score is implemented in XQuery and can be used in tools, editors and repositories.
Annotating Spoken Language
(2014)
Growing globalisation of the world draws attention to cultural differences between people from different countries or from different cultures within the countries. Notwithstanding the diversity of people’s worldviews, current cross-cultural research still faces the challenge of how to avoid ethnocentrism; comparing Western-driven phenomena with like variables across countries without checking their conceptual equivalence clearly is highly problematic. In the present article we argue that simple comparison of measurements (in the quantitative domain) or of semantic interpretations (in the qualitative domain) across cultures easily leads to inadequate results. Questionnaire items or text produced in interviews or via open-ended questions have culturally laden meanings and cannot be mapped onto the same semantic metric. We call the culture-specific space and relationship between variables or meanings a ’cultural metric’, that is a set of notions that are inter-related and that mutually specify each other’s meaning. We illustrate the problems and their possible solutions with examples from quantitative and qualitative research. The suggested methods allow to respect the semantic space of notions in cultures and language groups and the resulting similarities or differences between cultures can be better understood and interpreted.
Prejudice against a social group may lead to discrimination of members of this group. One very strong cue of group membership is a (non)standard accent in speech. Surprisingly, hardly any interventions against accent-based discrimination have been tested. In the current article, we introduce an intervention in which what participants experience themselves unobtrusively changes their evaluations of others. In the present experiment, participants in the experimental condition talked to a confederate in a foreign language before the experiment, whereas those in the control condition received no treatment. Replicating previous research, participants in the control condition discriminated against Turkish-accented job candidates. In contrast, those in the experimental condition evaluated Turkish- and standard-accented candidates as similarly competent. We discuss potential mediating and moderating factors of this effect.
Studies on social perception reveal that on many dimensions, smiling individuals are perceived more positively in comparison with non-smiling individuals. The experiment carried out in seven countries (China, Germany, Iran, Norway, Poland, USA, and the Republic of South Africa) showed that in some cultures, smiling individuals may be perceived less favorably than non-smiling individuals. We compared ratings of intelligence made by participants viewing photos of smiling and non-smiling people. The results showed that smiling individuals were perceived as more intelligent in Germany and in China; smiling individuals were perceived as less intelligent than the (same) non-smiling individuals in Iran. We suggest that the obtained effects can be explained by the cultural diversity within the dimension of uncertainty avoidance described in the GLOBE (Global Leadership and Organizational Behavior Effectiveness) project by House, Hanges, Javidan, Dorfman, and Gupta.
Feminine forms of job titles raise great interest in many countries. However, it is still unknown how they shape stereotypical impressions on warmth and competence dimensions among female and male listeners. In an experiment with fictitious job titles men perceived women described with feminine job titles as significantly less warm and marginally less competent than women with masculine job titles, which led to lower willingness to employ them. No such effects were observed among women.
We examine the task of relation extraction in the food domain by employing distant supervision. We focus on the extraction of two relations that are not only relevant to product recommendation in the food domain, but that also have significance in other domains, such as the fashion or electronics domain. In order to select suitable training data, we investigate various degrees of freedom. We consider three processing levels being argument level, sentence level and feature level. As external resources, we employ manually created surface patterns and semantic types on all these levels. We also explore in how far rule-based methods employing the same information are competitive.
Power, in this article, is to be understood as an instrument of force that is imposed purposely in order to influence, affect or persuade others. The question here is whether such power is due to aggressive expressions (lexical level) or to context-dependent aspects (discourse level) that become relevant when insulting persons via new media. I will distinguish between “cyberbullying” as an attempt to hurt a persons feelings directly via personal SMS or email and “virtual character assassination attempts” that include third parties as an audience. Potential readers not directly involved are considered a constitutive eliciting element of power. It is assumed that their existence is even more important and effective (in terms of strengthening the perpetrators power) than aggressive language.
We present a technique called event mapping that allows to project text representations into event lists, produce an event table, and derive quantitative conclusions to compare the text representations. The main application of the technique is the case where two classes of text representations have been collected in two different settings (e.g., as annotations in two different formal frameworks) and we can compare the two classes with respect to their systematic differences in the event table. We illustrate how the technique works by applying it to data collected in two experiments (one using annotations in Vladimir Propp’s framework, the other using natural language summaries).
We examine the task of separating types from brands in the food domain. Framing the problem as a ranking task, we convert simple textual features extracted from a domain-specific corpus into a ranker without the need of labeled training data. Such method should rank brands (e.g. sprite) higher than types (e.g. lemonade). Apart from that, we also exploit knowledge induced by semi-supervised graph-based clustering for two different purposes. On the one hand, we produce an auxiliary categorization of food items according to the Food Guide Pyramid, and assume that a food item is a type when it belongs to a category unlikely to contain brands. On the other hand, we directly model the task of brand detection using seeds provided by the output of the textual ranking features. We also harness Wikipedia articles as an additional knowledge source.
Automatic Food Categorization from Large Unlabeled Corpora and Its Impact on Relation Extraction
(2014)
We present a weakly-supervised induction method to assign semantic information to food items. We consider two tasks of categorizations being food-type classification and the distinction of whether a food item is composite or not. The categorizations are induced by a graph-based algorithm applied on a large unlabeled domain-specific corpus. We show that the usage of a domain-specific corpus is vital. We do not only outperform a manually designed open-domain ontology but also prove the usefulness of these categorizations in relation extraction, outperforming state-of-the-art features that include syntactic information and Brown clustering.
We report on the two systems we built for Task 1 of the German Sentiment Analysis Shared Task, the task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS). The first system is a rule-based system relying on a predicate lexicon specifying extraction rules for verbs, nouns and adjectives, while the second is a translation-based system that has been obtained with the help of the (English) MPQA corpus.
We present the German Sentiment Analysis Shared Task (GESTALT) which consists of two main tasks: Source, Subjective Expression and Target Extraction from Political Speeches (STEPS) and Subjective Phrase and Aspect Extraction from Product Reviews (StAR). Both tasks focused on fine-grained sentiment analysis, extracting aspects and targets with their associated subjective expressions in the German language. STEPS focused on political discussions from a corpus of speeches in the Swiss parliament. StAR fostered the analysis of product reviews as they are available from the website Amazon.de. Each shared task led to one participating submission, providing baselines for future editions of this task and highlighting specific challenges. The shared task homepage can be found at https://sites.google.com/site/iggsasharedtask/.