Refine
Year of publication
- 2009 (49) (remove)
Document Type
- Conference Proceeding (17)
- Part of a Book (16)
- Article (11)
- Book (2)
- Contribution to a Periodical (1)
- Doctoral Thesis (1)
- Working Paper (1)
Language
- English (49) (remove)
Keywords
- Korpus <Linguistik> (11)
- Deutsch (9)
- Computerlinguistik (8)
- Annotation (5)
- Automatische Sprachanalyse (4)
- Natürliche Sprache (3)
- Syntaktische Analyse (3)
- Verb (3)
- Algorithmus (2)
- Bildung (2)
Publicationstate
- Veröffentlichungsversion (21)
- Postprint (9)
- Zweitveröffentlichung (4)
Reviewstate
Publisher
- Elsevier (3)
- Acta Universitatis Upsaliensis (2)
- Benjamins (2)
- Narr (2)
- Niemeyer (2)
- Oxford University Press (2)
- Palgrave Macmillan (2)
- Springer (2)
- AAAI Press (1)
- Association for Computational Linguistics (1)
While written corpora can be exploited without any linguistic annotations, speech corpora need at least a basic transcription to be of any use for linguistic research. The basic annotation of speech data usually consists of time-aligned orthographic transcriptions. To answer phonetic or phonological research questions, phonetic transcriptions are needed as well. However, manual annotation is very time-consuming and requires considerable skill and near-native competence. Therefore it can take years of speech corpus compilation and annotation before any analyses can be carried out. In this paper, approaches that address the transcription bottleneck of speech corpus exploitation are presented and discussed, including crowdsourcing the orthographic transcription, automatic phonetic alignment, and query-driven annotation. Currently, query-driven annotation and automatic phonetic alignment are being combined and applied in two speech research projects at the Institut für Deutsche Sprache (IDS), whereas crowdsourcing the orthographic transcription still awaits implementation.
The purpose of this paper is to describe the functions of ‘where’-based relative elements' in six Balkan languages, paying particular attention to non-standard varieties.2 Relative elements based on an originally interrogative pronoun meaning ‘where’ are attested in all Balkan languages and, more generally, in all European languages. In accordance with the locative meaning of the original pronoun, ‘where’-based relative elements are primarily used to relativize locatives. However, it will be shown that in some Balkan languages, and especially in non-standard varieties, these elements have extended their functional domain. This process does not appear to be random, but rather to pattern with the following hierarchy: locative > unspecific connector > other syntactic positions (indirect/direct object, subject).3 Additionally, ‘where’-based relative elements will be compared with ‘what’-based ones in order to highlight common patterns of development.
In spite of the obvious importance that is accorded to the notion grammatical construction in any approach that sees itself as a construction grammar (CxG), there is as yet no generally accepted definition of the term across different variants of the framework. In particular, there are different assumptions about which additional requirements a given structure has to meet in order to be recognized as a construction besides being a ‘form-meaning pair’. Since the choice of a particular definition will determine the range of both relevant phenomena and concrete observations to be considered in empirical research within the framework, the issue is not just a mere terminological quibble but has important methodological repercussions especially for quantitative research in areas such as corpus linguistics. The present study illustrates some problems in identifying and delimiting such patterns in naturally occurring text and presents arguments for a usage-based interpretation of the term grammatical construction.
In this paper we present an approach to faceted search in large language resource repositories. This kind of search which enables users to browse through the repository by choosing their personal sequence of facets heavily relies on the availability of descriptive metadata for the objects in the repository. This approach therefore informs the collection of a minimal set of metatdata for language resources. The work described in this paper has been funded by the EC within the ESFRI infrastructure project CLARIN.
We compare the use of überhaupt and sowieso in Dutch and German. We use the world-wide web as the main resource and pursue a zigzag strategy, trying to find usages going back and forth between dictionaries, intuitions and real data obtained through web search. To our surprise, the results more or less confirm the decision of Dutch dictionaries to consider überhaupt and sowieso synonymous. In German, we find no synonymy, but only a great overlap of usage conditions in declarative sentences.
Concurrent standardization as a necessity: The genesis of the new official orthographic guidelines
(2009)
The new official orthographic guidelines were brought into force by the official state authorities on August 1st, 1998 and its principle goals were a standardized representation of the guidelines and a «gentle simplification in respect of content». This regulation was not supported by the public and in fact it was the starting point for a struggle for conceptual solutions and a quest for the achievement of' a consensus between different possible norms. Since orthography is an officially codified standard taking up a prominent position among linguistic standards, it is of particular socio-political importance. It was the foremost task of the Council for German Orthography (Rat für deutsche Rechtschreibung), instituted in December 2004, to elaborate a compromise in order to bring the «Orthographical war» (Die Zeit) to an end, which was led enthusiastically for more than a decade. - The concern of this article is to classify historically the agreement reached in 2006. Against this background, it can be stated that official guidelines will only be accepted, if they are based upon the usage in writing and if they take into account the interests of the reader. Both principles are characterizing the proposal made by the Council for German Orthography. An outlook on the Council's activities concerning orthographic standardization expected in the future will conclude this article.
Manual development of deep linguistic resources is time-consuming and costly and therefore often described as a bottleneck for traditional rule-based NLP. In my PhD thesis I present a treebank-based method for the automatic acquisition of LFG resources for German. The method automatically creates deep and rich linguistic presentations from labelled data (treebanks) and can be applied to large data sets. My research is based on and substantially extends previous work on automatically acquiring wide-coverage, deep, constraint-based grammatical resources from the English Penn-II treebank (Cahill et al.,2002; Burke et al., 2004; Cahill, 2004). Best results for English show a dependency f-score of 82.73% (Cahill et al., 2008) against the PARC 700 dependency bank, outperforming the best hand-crafted grammar of Kaplan et al. (2004). Preliminary work has been carried out to test the approach on languages other than English, providing proof of concept for the applicability of the method (Cahill et al., 2003; Cahill, 2004; Cahill et al., 2005). While first results have been promising, a number of important research questions have been raised. The original approach presented first in Cahill et al. (2002) is strongly tailored to English and the datastructures provided by the Penn-II treebank (Marcus et al., 1993). English is configurational and rather poor in inflectional forms. German, by contrast, features semi-free word order and a much richer morphology. Furthermore, treebanks for German differ considerably from the Penn-II treebank as regards data structures and encoding schemes underlying the grammar acquisition task. In my thesis I examine the impact of language-specific properties of German as well as linguistically motivated treebank design decisions on PCFG parsing and LFG grammar acquisition. I present experiments investigating the influence of treebank design on PCFG parsing and show which type of representations are useful for the PCFG and LFG grammar acquisition tasks. Furthermore, I present a novel approach to cross-treebank comparison, measuring the effect of controlled error insertion on treebank trees and parser output from different treebanks. I complement the cross-treebank comparison by providing a human evaluation using TePaCoC, a new testsuite for testing parser performance on complex grammatical constructions. Manual evaluation on TePaCoC data provides new insights on the impact of flat vs. hierarchical annotation schemes on data-driven parsing. I present treebank-based LFG acquisition methodologies for two German treebanks. An extensive evaluation along different dimensions complements the investigation and provides valuable insights for the future development of treebanks.
In this paper we address the question of what is needed, in terms of morphosyntactic encoding, to relate a so-called verb-specific modifier to a nominal head. For the purposes of this paper we shall assume that the notion of a verb-specific modifier includes adverbs and their phrasal or clausal projections, adpositional phrases, and noun phrases featuring a particular semantic case such as locative or instrumental. Noun-specific modifiers, in turn, are considered to be first and foremost adjectives and adjective phrases, next participles and their phrasal projections and, finally, relative clauses.1 The basic motivation underlying this distinction relates to markedness.