Refine
Year of publication
- 2012 (17) (remove)
Document Type
- Part of a Book (7)
- Conference Proceeding (7)
- Article (2)
- Book (1)
Language
- English (17) (remove)
Is part of the Bibliography
- no (17)
Keywords
- Korpus <Linguistik> (5)
- Annotation (3)
- Sprachpolitik (3)
- Englisch (2)
- Europa (2)
- Gesprochene Sprache (2)
- Linguistic Landscape (2)
- Minderheitensprache (2)
- Sprachkonflikt (2)
- Sprachkontakt (2)
Publicationstate
- Veröffentlichungsversion (17) (remove)
Reviewstate
- (Verlags)-Lektorat (17) (remove)
Publisher
- European Language Resources Association (ELRA) (4)
- Lang (3)
- de Gruyter (3)
- Palgrave Macmillan (2)
- Eigenverlag ÖGAI (1)
- Linguistic Society of America (1)
- Springer (1)
When we first started the project of looking at minority languages through a linguistic landscape lens, we felt that the visibility of minority languages in public space had been insufficiently dealt with in traditional minority language research. A linguistic landscape approach, as it had developed over the last years, would constitute a valuable path to explore, by looking at the ‘same old issues’ of language contact and language conflict from a specific angle. We were convinced that fresh linguistic landscape data would be able to provide innovative and useful insights into ‘patterns of language […] use, official language policies, prevalent language attitudes, [and] power relations between different linguistic groups’ (Backhaus 2007, p. 11). The linguistic landscape approach, as presented by the different authors in this volume, has clearly proven to be a heuristic appropriate and relevant for a wide range of minority language situations. More specifically, the ideas and analyses in the different chapters do contribute to a further understanding of minority languages and their speakers. They deepen our comprehension of language policies, power relations and ideologies in minority language settings.
Providing an innovative approach to the written displays of minority languages in public space this volume explores minority language situations through the lens of linguistic landscape research. Based on very tangible data it explores the 'same old issues' of language contact and language conflict in new ways.
Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.
This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.
Opening/Eröffnung/Aperture
(2012)
I nationale og curopa’iskc sprogpolitiske undersogelser savner man orte et tilt'redsstiIlende cmpirisk grundlag. De tilgsngelige data om den aktuelle Situation for sprogene i de forskelligc lande er heterogene. ufuldstEndige og delvist foraddede og derfor vanskelige at sammenligne over tid. EKNIL’s curoptciskc sprogbarometer. KLM, er et forsog pä al afhjxlpe denne Situation. KLM er baseret pä et omfattende spor- geskema om en bred vifte al’sproglige forhold som er egnet til at danne et billede at'sprogenes Status og sprogpolitiske praksisser i hvert enkelt land. fx sprogencs juridiske Status, sprogenes Status i undervis- ning og forskning, Situationen for minoritetssprog, sprogene i kulturen og i erhvervslivet. KLM gennem- tores med fä ärs mellemrum. Naervjerende artikel beskriver baggrunden og resultateme af KLM 2 (2007- 2011) som omfatler 23 europa’iske lande
Introduction
(2012)
In this paper, we provide an analysis of temporality in Hausa (Chadic, Afro-Asiatic). By testing the hypothesis of covert tense (Matthewson 2006) against empirical data, we show that Hausa is genuinely tenseless in the sense that the grammar does not restrict the relation between reference time and utterance time. Rather, temporal reference is pragmatically inferred from aspectual and contextual information. We also argue that future time reference in Hausa is realized as a combination of a modal operator and a prospective aspect, thus involving the modal meaning components of intention and prediction as well as event time shifting.
Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several storage models and indexing variants in two multi-processor/multi-core environments, focusing on prototypical linguistic querying scenarios. Our aim is to reveal modeling and querying tendencies – rather than absolute benchmark results – when using a relational database management system (RDBMS) and MapReduce for natural language corpus retrieval. Based on these findings, we are going to improve our approach for the efficient exploitation of very large corpora, combining advantages of state-of-the-art database systems with decomposition/parallelization strategies. Our reference implementation uses the German DeReKo reference corpus with currently more than 4 billion word forms, various multi-layer linguistic annotations, and several types of text-specific metadata. The proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
The present article describes the first stage of the KorAP project, launched recently at the Institut für Deutsche Sprache (IDS) in Mannheim, Germany. The aim of this project is to develop an innovative corpus analysis platform to tackle the increasing demands of modern linguistic research. The platform will facilitate new linguistic findings by making it possible to manage and analyse primary data and annotations in the petabyte range, while at the same time allowing an undistorted view of the primary linguistic data, and thus fully satisfying the demands of a scientific tool. An additional important aim of the project is to make corpus data as openly accessible as possible in light of unavoidable legal restrictions, for instance through support for distributed virtual corpora, user-defined annotations and adaptable user interfaces, as well as interfaces and sandboxes for user-supplied analysis applications. We discuss our motivation for undertaking this endeavour and the challenges that face it. Next, we outline our software implementation plan and describe development to-date.
This document presents ongoing work related to spoken language data within a project that aims to establish a common and unified infrastructure for the sustainable provision of linguistic primary research data at the Institut für Deutsche Sprache (IDS). In furtherance of its mission to “document the German language as it is currently used”, the project expects to enable the research community to access a broad empirical base of working material via a single platform. While the goal is to eventually cover all linguistically relevant digital resources of the IDS, including lexicographic information systems such as the IDS German Vocabulary Portal, OWID, written language corpora such as the IDS German Reference Corpus, DeReKo, and spoken language corpora such as the IDS German Speech Corpus for Research and Teaching, FOLK, the work presented here predominantly focuses on the latter type of data, i.e. speech corpora. Within this context, the present document pictures the project’s contributions to the development of standards and best practice guidelines concerning data storage, process documentation and legal issues for the sustainable preservation and long-term accessibility of primary linguistic research data.
Research today is often performed in collaborated projects composed of project partners with different backgrounds and from different institutions and countries. Standards can be a crucial tool to help harmonizing these differences and to create sustainable resources. However, choosing a standard depends on having enough information to evaluate and compare different annotation and metadata formats. In this paper we present ongoing work on an interactive, collaborative website that collects information on standards in the field of linguistics as a means to guide interested researchers.
In order to explore the influence of context on the phonetic design of talk-in-interaction, we investigated the pitch characteristics of short turns (insertions) that are produced by one speaker between turns from another speaker. We investigated the hypothesis that the speaker of the insertion designs her turn as a pitch match to the prior turn in order to align with the previous speaker’s agenda, whereas non-matching displays that the speaker of the insertion is non-aligning, for example to initiate a new action. Data were taken from the AMI meeting corpus, focusing on the spontaneous talk of first-language English participants. Using sequential analysis, 177 insertions were classified as either aligning or non-aligning in accordance with definitions of these terms in the Conversation Analysis literature. The degree of similarity between the pitch contour of the insertion and that of the prior speaker’s turn was measured, using a new technique that integrates normalized F0 and intensity information. The results showed that aligning insertions were significantly more similar to the immediately preceding turn, in terms of pitch contour, than were non-aligning insertions. This supports the view that choice of pitch contour is managed locally, rather than by reference to an intonational lexicon.