Refine
Year of publication
Document Type
- Part of a Book (31)
- Article (22)
- Conference Proceeding (8)
- Book (2)
Language
- English (63) (remove)
Keywords
- Computerlinguistik (14)
- Deutsch (12)
- Natürliche Sprache (8)
- Korpus <Linguistik> (7)
- Annotation (5)
- Automatische Sprachanalyse (5)
- Konversationsanalyse (5)
- Maschinelles Lernen (5)
- Gesprochene Sprache (4)
- Information Extraction (4)
Publicationstate
- Postprint (30)
- Veröffentlichungsversion (19)
- Zweitveröffentlichung (13)
- Preprint (2)
Reviewstate
- Peer-Review (24)
- (Verlags)-Lektorat (21)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
- Springer (63) (remove)
In this paper, we examine methods to extract different domain-specific relations from the food domain. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. As we need to process a large amount of unlabeled data our methods only require a low level of linguistic processing. This has also the advantage that these methods can provide responses in real time.
Multinomial processing tree (MPT) models are a class of measurement models that account for categorical data by assuming a finite number of underlying cognitive processes. Traditionally, data are aggregated across participants and analyzed under the assumption of independently and identically distributed observations. Hierarchical Bayesian extensions of MPT models explicitly account for participant heterogeneity by assuming that the individual parameters follow a continuous hierarchical distribution.We provide an accessible introduction to hierarchical MPT modeling and present the user-friendly and comprehensive R package TreeBUGS, which implements the two most important hierarchical MPT approaches for participant heterogeneity—the beta-MPT approach (Smith & Batchelder, Journal of Mathematical Psychology 54:167-183, 2010) and the latent-trait MPT approach (Klauer, Psychometrika 75:70-98, 2010). TreeBUGS reads standard MPT model files and obtains Markov-chain Monte Carlo samples that approximate the posterior distribution. The functionality and output are tailored to the specific needs of MPT modelers and provide tests for the homogeneity of items and participants, individual and group parameter estimates, fit statistics, and within- and between-subjects comparisons, as well as goodness-of-fit and summary plots. We also propose and implement novel statistical extensions to include continuous and discrete predictors (as either fixed or random effects) in the latent-trait MPT model.
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.
Neologisms, i.e., new words or meanings, are finding their way into everyday language use all the time. In the process, already existing elements of a language are recombined or linguistic material from other languages is borrowed. But are borrowed neologisms accepted similarly well by the speech community as neologisms that were formed from “native” material? We investigate this question based on neologisms in German. Building on the corresponding results of a corpus study, we test the hypothesis of whether “native” neologisms are more readily accepted than those borrowed from English. To do so, we use a psycholinguistic experimental paradigm that allows us to estimate the degree of uncertainty of the participants based on the mouse trajectories of their responses. Unexpectedly, our results suggest that the neologisms borrowed from English are accepted more frequently, more quickly, and more easily than the “native” ones. These effects, however, are restricted to people born after 1980, the so-called millenials. We propose potential explanations for this mismatch between corpus results and experimental data and argue, among other things, for a reinterpretation of previous corpus studies.
Researchers interested in the sounds of speech or the physical gestures of Speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio Signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data.
As an Introduction to the Special Issue on "Formulation, generalization,
and abstraction in interaction,’’ this paper discusses key problems of a conversation
analytic (CA) approach to semantics in interaction. Prior research in CA and
Interactional Linguistics has only rarely dealt with issues of linguistic meaning in
interaction. It is argued that this is a consequence of limitations of sequential
analysis to capture meaning in interaction. While sequential analysis remains the
encompassing methodological framework, it is suggested that it needs to be complemented
by analyzing semantic relationships between choices of formulation in
the interaction, ethnography, and structural techniques of comparing selected
options with possible alternatives. The paper describes the methodological approach
taken to interactional semantics by the papers in the Special Issue, which analyse
practices of generalization and abstraction in interaction as they are accomplished
by formulations of prior versions of reference and description.
This paper discusses the semi-formal language of mathematics and presents the Naproche CNL, a controlled natural language for mathematical authoring. Proof Representation Structures, an adaptation of Discourse Representation Structures, are used to represent the semantics of texts written in the Naproche CNL. We discuss how the Naproche CNL can be used in formal mathematics, and present our prototypical Naproche system, a computer program for parsing texts in the Naproche CNL and checking the proofs in them for logical correctness.
The lexicography of German
(2020)
This chapter discusses the main dictionaries of the German language as it is spoken and written in Germany, and also German as it is spoken and written in Austria, Switzerland, the eastern fringes of Belgium, and South Tyrol. It also briefly describes Pennsylvania German. Corpora and other language resources used in German dictionary-making are also presented. Finally, there is a discussion of some current issues in German lexicography, as well as future prospects.
The ISOcat registry reloaded
(2012)
The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry, a collaborative platform to hold a (to be standardized) set of data categories (i.e., field descriptors). Descriptors have definitions in natural language and little explicit interrelations. With the registry growing to many hundred entries, authored by many, it is becoming increasingly apparent that the rather informal definitions and their glossary-like design make it hard for users to grasp, exploit and manage the registry’s content. In this paper, we take a large subset of the ISOcat term set and reconstruct from it a tree structure following the footsteps of schema.org. Our ontological re-engineering yields a representation that gives users a hierarchical view of linguistic, metadata-related terminology. The new representation adds to the precision of all definitions by making explicit information which is only implicitly given in the ISOcat registry. It also helps uncovering and addressing potential inconsistencies in term definitions as well as gaps and redundancies in the overall ISOcat term set. The new representation can serve as a complement to the existing ISOcat model, providing additional support for authors and users in browsing, (re-)using, maintaining, and further extending the community’s terminological metadata repertoire.