Refine
Year of publication
Document Type
- Conference Proceeding (112) (remove)
Has Fulltext
- yes (112)
Keywords
- Korpus <Linguistik> (46)
- Deutsch (32)
- Annotation (15)
- Auszeichnungssprache (10)
- Computerlinguistik (8)
- Computerunterstützte Lexikographie (8)
- Head-driven phrase structure grammar (8)
- Gesprochene Sprache (6)
- HPSG (6)
- Polnisch (6)
Publicationstate
- Veröffentlichungsversion (112) (remove)
Reviewstate
- (Verlags)-Lektorat (112) (remove)
Publisher
- European Language Resources Association (ELRA) (12)
- Association for Computational Linguistics (5)
- CSLI Publications (5)
- Nisaba (5)
- Extreme Markup Languages Conference (4)
- European Language Resources Association (3)
- Ivane Javakhishvili Tbilisi State University (3)
- University of Birmingham (3)
- University of Illinois (3)
- University of Oulu (3)
The present paper reports the first results of the compilation and annotation of a blog corpus for German. The main aim of the project is the representation of the blog discourse structure and relations between its elements (blog posts, comments) and participants (bloggers, commentators). The data included in the corpus were manually collected from the scientific blog portal SciLogs. The feature catalogue for the corpus annotation includes three types of information which is directly or indirectly provided in the blog or can be construed by means of statistical analysis or computational tools. At this point, only directly available information (e.g. title of the blog post, name of the blogger etc.) has been annotated. We believe, our blog corpus can be of interest for the general study of blog structure or related research questions as well as for the development of NLP methods and techniques (e.g. for authorship detection).
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with multi-layer linguistic annotations and several types of text-specific metadata, but the proposed strategy is language-independent and adaptable to large-scale multilingual corpora.
In this paper we present a new approach to lexicographical design for the description of German speech act verbs. This approach is based on an action-theoretical semantic conception. The several conditions for linguistic action provide the basis for the elaboration of the central semantic features. The systematic relationship of these features is reflected in the organization of a lexical database which allows various possibilities of access to different types of lexical information.
In the following paper we shall give an outline of the semantic framework for describing speech act verbs, i. e. verbs of communication, with the practical goal of a semantical database for a (dictionary of) synonymy of German speech act verbs which enables the user not only to find a list of synonymous verbs but also enables him to gain an insight into the semantic relations between the words.
The semantic framework is based on
(i) a set of conditions for performing speech acts as the relevant domain of reference
(ii) the introduction of a notion of situation, or better type of situation
The performative as well as the descriptive use of the verbs can be reduced to their fundamental dependency on the situations in which they are used: on the one hand with regard to the possibility of the action itself, and on the other hand with regard to the possibility of their designation. For both ways of use the relevant aspects of the situation constitute the necessary conditions.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
One of the most popular techniques used in HPSG-based studies to describe linguistic phenomena is the raising mechanism. Besides ordinary raising verbs or adjectives, this tool has been applied for handling verbal complexes and discontinuous constituents, among other phenomena. In this paper, a new application for raising within the HPSG paradigm will be discussed, thereby investigating data from the prepositional domain. We will analyze linguistic properties of word combinations in German consisting of a preposition, a noun, and another preposition (such as auf Grund von (‘by virtue of’)), thus arguing that raising is the most appropriate method for satisfactorily describing the crucial syntactic features which are typical for those expressions. The objective of this paper is thus to demonstrate the efficiency of the raising mechanism as used in HPSG, and therefore, to emphasize the importance of designing a satisfactory uniform theory of raising within this grammar framework.
This paper presents the current results of an ongoing research project on corpus distribution of prepositions and pronouns within Polish preposition-pronoun contractions. The goal of the project is to provide a quantitative description of Polish preposition-pronoun contractions taking into consideration morphosyntactic properties of their components. It is expected that the results will provide a basis for a revision of the traditionally assumed inflectional paradigms of Polish pronouns and, thus, for a possible remodeling of these paradigms. The results of corpus-based investigations of the distribution of prepositions within preposition-pronoun contractions can be used for grammar-theoretical and lexicographic purposes.
Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks
(2016)
In discourse relation annotation, there is currently a variety of different frameworks being used, and most of them have been developed and employed mostly on written data. This raises a number of questions regarding interoperability of discourse relation annotation schemes, as well as regarding differences in discourse annotation for written vs. spoken domains. In this paper, we describe ouron annotating two spoken domains from the SPICE Ireland corpus (telephone conversations and broadcast interviews) according todifferent discourse annotation schemes, PDTB 3.0 and CCR. We show that annotations in the two schemes can largely be mappedone another, and discuss differences in operationalisations of discourse relation schemes which present a challenge to automatic mapping. We also observe systematic differences in the prevalence of implicit discourse relations in spoken data compared to written texts,find that there are also differences in the types of causal relations between the domains. Finally, we find that PDTB 3.0 addresses many shortcomings of PDTB 2.0 wrt. the annotation of spoken discourse, and suggest further extensions. The new corpus has roughly theof the CoNLL 2015 Shared Task test set, and we hence hope that it will be a valuable resource for the evaluation of automatic discourse relation labellers.