Refine
Year of publication
Document Type
- Conference Proceeding (57) (remove)
Has Fulltext
- yes (57)
Is part of the Bibliography
- no (57) (remove)
Keywords
- Computerlinguistik (20)
- Natürliche Sprache (11)
- Maschinelles Lernen (9)
- Deutsch (8)
- Information Extraction (8)
- Korpus <Linguistik> (5)
- Sentimentanalyse (4)
- Automatische Sprachanalyse (3)
- Computerunterstützte Lexikografie (3)
- Digital Humanities (3)
Publicationstate
- Zweitveröffentlichung (57) (remove)
Reviewstate
- Peer-Review (30)
- (Verlags)-Lektorat (18)
- Review-Status-unbekannt (1)
Publisher
In the context of a Nordic Conference on Bilingualism, it can be a rewarding task to look at issues such as language planning, policy and legislation from a perspective of the southern neighbours of the Nordic world. This paper therefore intends to point attention towards a case of societal multilingualism at the periphery of the Nordic world by dealing with recent developments in language policy and legislation with regard to the North Frisian speech community in the German Land of Schleswig-Holstein. As I will show, it is striking to what degree there are considerable differences in the discourse on minority protection and language legislation between the Nordic countries and a cultural area which may arguably be considered to be part of the Nordic fringe - and which itself occasionally takes Scandinavia as a reference point, e.g. in the recent adoption of a pan-Frisian flag modelled on the Nordic cross (Falkena 2006).
The main focus of the paper will be on the Frisian Act which was passed in the Parliament of Schleswig-Holstein in late 2004. It provides a certain legal basis for some political activities with regard to Frisian, but falls short of creating a true spirit of minority language protection and/or revitalisation. In contrast to the traditions of the German and Danish minorities along the German-Danish border and to minority protection in Northern Scandinavia (in particular to Sámi language rights), the approach chosen in the Frisian Act is extremely weak and has no connotation of long-term oriented language-planning, let alone a rights-based perspective.
The paper will then look at policy developments in the time since the Act was passed, e.g. in the Schleswig-Holstein election campaign in 2005, and on latest perceptions of the Frisian language situation in the discourse on North Frisian Policy in Schleswig-Holstein majority society. In the final part of the paper, I will discuss reasons for the differences in minority language policy discourse between Germany and the Nordic countries, and try to provide an outlook on how Frisian could benefit from its geographic proximity to the Nordic world.
Im vorliegenden Beitrag soll gezeigt werden, wie Konnektoren als sprachliche Mittel zur Aktualisierung von zwei Arten konversationeller Aktivitäten eingesetzt werden können, nämlich von intersubjektiven bzw. gesprächsorganisatorischen Verfahren. Auf intersubjektive Verfahren greift ein Sprecher zurück, um in Kooperation mit seinem Gesprächspartner einen gemeinsamen Wissenshintergrund (common ground) zu schaffen. Durch gesprächsorganisatorische Verfahren greift der Sprecher in die gesprächsthematische Struktur des Interaktionsgeschehens ein. In diesem Beitrag wird die Aktualisierung dieser beiden konversationellen Verfahren am Beispiel der kommunikativen Gattung autobiographisches Interview betrachtet. Diese Gattung ist für eine solche Analyse m. E. besonders geeignet, denn sie zeichnet sich durch eine relativ scharfe Trennung der Gesprächsrollen aus, die das Nachvollziehen des Interaktionsgeschehens erleichtert. An einem autobiographischen Interview sind zwei Subjekte beteiligt: der Interviewte, der als Wissensträger gilt, und der Interviewer, der durch seine Rolle als Gesprächsleiter die Wissensvermittlung begünstigen soll. Der Interviewer ist also mit einer zweifachen Aufgabe konfrontiert, denn er muss die anfängliche Wissensasymmetrie ausgleichen und ist zugleich für die Gesprächsorganisation zuständig. Im Folgenden soll am Beispiel des Konjunktors und veranschaulicht werden, wie der Gebrauch von Konnektoren zur Bewältigung dieser beiden kommunikativen Aufgaben beitragen kann.
Automatic summarization systems usually are trained and evaluated in a particular domain with fixed data sets. When such a system is to be applied to slightly different input, labor- and cost-intensive annotations have to be created to retrain the system. We deal with this problem by providing users with a GUI which allows them to correct automatically produced imperfect summaries. The corrected summary in turn is added to the pool of training data. The performance of the system is expected to improve as it adapts to the new domain.
In this paper we present work in developing a computerized grammar for the Latin language. It demonstrates the principles and challenges in developing a grammar for a natural language in a modern grammar formalism. The grammar presented here provides a useful resource for natural language processing applications in different fields. It can be easily adopted for language learning and use in language technology for Cultural Heritage like translation applications or to support post-correction of document digitization.
Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations
(2009)
In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.
We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.
The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.
Lexical resources are often represented in table form, e. g., in relational databases, or represented in specially marked up texts, for example, in document based XML models. This paper describes how it is possible to model lexical structures as graphs and how this model can be used to exploit existing lexical resources and even how different types of lexical resources can be combined.
Cette contribution discute différents enjeux dégagés lors d’une étude des pratiques professionnelles plurilingues : ces enjeux ont émergé d’une analyse menée collaborativement par deux équipes de chercheurs, à Lyon et à Paris, participant au projet européen DYLAN (6e programme cadre) et élaborant ensemble l’analyse empirique d’un extrait d’une réunion de travail, enregistrée dans le cadre d’une collaboration sur un même terrain. Cette analyse est l’occasion de thématiser de manière exemplaire un certain nombre de questions surgissant de l’étude des contacts des langues dans les contextes professionnels, concernant aussi bien les enjeux épistémologiques que l'engagement du chercheur sur le terrain.
The ISOcat registry reloaded
(2012)
The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data categories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry’s content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community’s terminological metadata repertoire.
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semantical framework for information retrieval on internet resources.
To reach even language users not acquainted to the use of grammars the Institut für Deutsche Sprache in Mannheim (Germany) looked for new ways to handle grammatical problems. Instead of confronting users with abstractions frequent difficulties of German grammar are introduced in form of exemplary questions like „Which form should be used or preferred: Anfang dieses Jahre or Anfang diesen Jahres? Looking through the long list of such questions even laymen may find solutions of grammatical problems they might not be able to formulate as such.
On valence-binding grammars
(1978)
The valence of a verb determines the number, and the syntactic class, of those expressions that must co-occur with it in a sentence. Definitions of "valence-term" and "valence-boundness" are provided whereby the precise conditions are formulated that a valence-binding grammar must satisfy. These conditions are exemplified in the framework of a simple categorial grammar, in which various reductions of the general notions can be carried out.
The paper discusses particular logical consistency conditions satisfied by German proposition-embedding predicates which determine the question type (external and internal whether-form as well as exhaustive and non-exhaustive wh-form), the correlate type (es- or da-correlate) as well as the impact of the correlate on the respective consistency condition. It will turn out that some consistency conditions also determine the embedding of verb second and subject-control.
The planning of a dictionary should consider both theoretical and empiric aspects, either for its macro- and microstructure: this is true also for Online Specialized Dictionaries of Linguistics. In particular the microstructure should be standardized and structured so as to fit with the primary and secondary functions of a dictionary. Unfortunately, empirical studies that investigate Online Specialized Dictionaries of Linguistics are rare, making it unclear which microstructural elements are obligatory and which are facultative. This article will present and comment upon the results of an investigation into a corpus of Online Specialized Dictionaries of Linguistics, focusing attention on these aspects and also the most important theoretical issues. An example taken from DIL, a German-Italian Online Dictionary of Linguistics, will end the article.
Die vorliegende empirische Untersuchung befasst sich mit einer Umfrage zur Wörterbuchbenutzung bei 41 Studentinnen und Studenten des Dipartimento di Filologia, Letteratura e Linguistica der Universität Pisa, dasselbe Department, an dem auch das deutsch-italienische sprachwissenschaftliche Online-Wörterbuch DIL erarbeitet worden ist (vgl. Flinz: 2011). Die schriftliche Umfrage wurde in Anlehnung an Hartmanns 5. Hypothese „An analysis of users´ needs should precede dictionary design“ (1989) durchgeführt. Die wichtigsten Ergebnisse waren von großer Bedeutung für die Gestaltung der makro- und mikrostrukturellen Eigenschaften des Fachwörterbuches. Die Ergebnisse der Untersuchung und die daraus folgenden Reflektionen werden in thematischen Kernblöcken vorgestellt.
DIL is a bilingual (German-Italian) online dictionary of linguistics. It is still under construction and contains 240 lemmas belonging to the subfield of “German as a Foreign Language”, but other subfields are in preparation. DIL is an open dictionary; participation of experts from various subfields is welcome. The dictionary is intended for a user group with different levels of knowledge, therefore it is a multifunctional dictionary. An analysis of existing dictionaries, either in their online or written form, was essential in order to make important decisions for the macro- or microstructure of DIL; the results are discussed. Criteria for the selection of entries and an example of an entry conclude the article.