Refine
Year of publication
- 2008 (29) (remove)
Document Type
- Conference Proceeding (29) (remove)
Language
- English (29) (remove)
Is part of the Bibliography
- no (29)
Keywords
- Deutsch (6)
- Korpus <Linguistik> (5)
- Automatische Sprachanalyse (4)
- Annotation (3)
- Computerlinguistik (3)
- Information Extraction (3)
- Computerunterstützte Lexikographie (2)
- Datensatz (2)
- Digital Humanities (2)
- Gesprochene Sprache (2)
Publicationstate
- Veröffentlichungsversion (14)
- Postprint (2)
- Zweitveröffentlichung (2)
Reviewstate
- (Verlags)-Lektorat (7)
- Peer-Review (7)
- Verlags-Lektorat (1)
Publisher
- European Language Resources Association (ELRA) (6)
- ELRA (3)
- University of Oulu (3)
- European Language Resources Association (2)
- CSLI (1)
- EURALEX (1)
- INRIA (1)
- Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra (1)
- Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra: (1)
- International Speech Communication Association (1)
In the context of a Nordic Conference on Bilingualism, it can be a rewarding task to look at issues such as language planning, policy and legislation from a perspective of the southern neighbours of the Nordic world. This paper therefore intends to point attention towards a case of societal multilingualism at the periphery of the Nordic world by dealing with recent developments in language policy and legislation with regard to the North Frisian speech community in the German Land of Schleswig-Holstein. As I will show, it is striking to what degree there are considerable differences in the discourse on minority protection and language legislation between the Nordic countries and a cultural area which may arguably be considered to be part of the Nordic fringe - and which itself occasionally takes Scandinavia as a reference point, e.g. in the recent adoption of a pan-Frisian flag modelled on the Nordic cross (Falkena 2006).
The main focus of the paper will be on the Frisian Act which was passed in the Parliament of Schleswig-Holstein in late 2004. It provides a certain legal basis for some political activities with regard to Frisian, but falls short of creating a true spirit of minority language protection and/or revitalisation. In contrast to the traditions of the German and Danish minorities along the German-Danish border and to minority protection in Northern Scandinavia (in particular to Sámi language rights), the approach chosen in the Frisian Act is extremely weak and has no connotation of long-term oriented language-planning, let alone a rights-based perspective.
The paper will then look at policy developments in the time since the Act was passed, e.g. in the Schleswig-Holstein election campaign in 2005, and on latest perceptions of the Frisian language situation in the discourse on North Frisian Policy in Schleswig-Holstein majority society. In the final part of the paper, I will discuss reasons for the differences in minority language policy discourse between Germany and the Nordic countries, and try to provide an outlook on how Frisian could benefit from its geographic proximity to the Nordic world.
Current Natural Language Processing (NLP) systems feature high-complexity processing pipelines that require the use of components at different levels of linguistic and application specific processing. These components often have to interface with external e.g. machine learning and information retrieval libraries as well as tools for human annotation and visualization. At the UKP Lab, we are working on the Darmstadt Knowledge Processing Software Repository (DKPro) (Gurevych et al., 2007a; Müller et al., 2008) to create a highly flexible, scalable and easy-to-use toolkit that allows rapid creation of complex NLP pipelines for semantic information processing on demand. The DKPro repository consists of several main parts created to serve the purposes of different NLP application areas
In this paper we investigate the coverage of the two knowledge sources WordNet and Wikipedia for the task of bridging resolution. We report on an annotation experiment which yielded pairs of bridging anaphors and their antecedents in spoken multi-party dialog. Manual inspection of the two knowledge sources showed that, with some interesting exceptions, Wikipedia is superior to WordNet when it comes to the coverage of information necessary to resolve the bridging anaphors in our data set. We further describe a simple procedure for the automatic extraction of the required knowledge from Wikipedia by means of an API, and discuss some of the implications of the procedure’s performance.
In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA components is beneficial for configuration management, component reuse, implementation costs, analysis and visualization.
Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers and field linguists. A variety of lexicon schemas have been developed, with goals ranging from computational lexicography (DATR) through archiving (LIFT, TEI) to standardization (LMF, FSR). A number of requirements for lexicon schemas are given. The lexicon schemas are introduced and compared to each other in terms of conversion and usability for this particular user group, using a common lexicon entry and providing examples for each schema under consideration. The formats are assessed and the final recommendation is given for the potential users, namely to request standard compliance from the developers of the tools used. This paper should foster a discussion between authors of standards, lexicographers and field linguists.
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. In an ordinary training set, there are far more incorrect answers than correct answers. The class-imbalance is, thus, inherent to the classification task. It has a deteriorating effect on the performance of classifiers trained by standard machine learning algorithms. They usually have a heavy bias towards the majority class, i.e. the class which occurs most often in the training set. In this paper, we propose a method to tackle class imbalance by applying some form of cost-sensitive learning which is preferable to sampling. We present a simple but effective way of estimating the misclassification costs on the basis of class distribution. This approach offers three benefits. Firstly, it maintains the distribution of the classes of the labeled training data. Secondly, this form of meta-learning can be applied to a wide range of common learning algorithms. Thirdly, this approach can be easily implemented with the help of state-of-the-art machine learning software.
The metadata management system for speech corpora “memasysco” has been developed at the Institut für Deutsche Sprache (IDS) and is applied for the first time to document the speech corpus “German Today”. memasysco is based on a data model for the documentation of speech corpora and contains two generic XML schemas that drive data capture, XML native database storage, dynamic publishing, and information retrieval. The development of memasysco’s information architecture was mainly based on the ISLE MetaData Initiative (IMDI) guidelines for publishing metadata of linguistic resources. However, since we also have to support the corpus management process in research projects at the IDS, we need a finer atomic granularity for some documentation components as well as more restrictive categories to ensure data integrity. The XML metadata of different speech corpus projects are centrally validated and natively stored in an Oracle XML database. The extension of the system to the management of annotations of audio and video signals (e.g. orthographic and phonetic transcriptions) is planned for the near future.
This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.
This work proposes opinion frames as a representation of discourse-level associations which arise from related opinion topics. We illustrate how opinion frames help gather more information and also assist disambiguation. Finally we present the results of our experiments to detect these associations.
In our study we use the experimental framework of priming to manipulate our subjects’ expectations of syllable prominence in sentences with a well-defined syntactic and phonological structure. It shows that it is possible to prime prominence patterns and that priming leads to significant differences in the judgment of syllable prominence.