Refine
Year of publication
Document Type
- Part of a Book (31)
- Article (22)
- Conference Proceeding (8)
- Book (2)
Language
- English (63) (remove)
Keywords
- Computerlinguistik (14)
- Deutsch (12)
- Natürliche Sprache (8)
- Korpus <Linguistik> (7)
- Annotation (5)
- Automatische Sprachanalyse (5)
- Konversationsanalyse (5)
- Maschinelles Lernen (5)
- Gesprochene Sprache (4)
- Information Extraction (4)
Publicationstate
- Postprint (30)
- Veröffentlichungsversion (19)
- Zweitveröffentlichung (13)
- Preprint (2)
Reviewstate
- Peer-Review (24)
- (Verlags)-Lektorat (21)
- Peer-review (1)
- Verlags-Lektorat (1)
Publisher
- Springer (63) (remove)
We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the orthographic realizations seem to be linked to the degree of expressivity.
We present a method for detecting and reconstructing separated particle verbs in a corpus of spoken German by following an approach suggested for written language. Our study shows that the method can be applied successfully to spoken language, compares different ways of dealing with structures that are specific to spoken language corpora, analyses some remaining problems, and discusses ways of optimising precision or recall for the method. The outlook sketches some possibilities for further work in related areas.
The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond
(2023)
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperability, ease of access to research data, tools and services, the importance of sharing knowledge within and across (national) communities, and community building. By taking into account FAIR principles from the very beginning, CLARIN succeeded in becoming a successful example of a research infrastructure that is actively used by its members. The benefits CLARIN members reap from their infrastructure secure a future for their common good that is both sustainable and attractive to partners beyond the original target groups.
Researchers interested in the sounds of speech or the physical gestures of Speakers make use of audio and video recordings in their work. Annotating these recordings presents a different set of requirements to the annotation of text. Special purpose tools have been developed to display video and audio Signals and to allow the creation of time-aligned annotations. This chapter reviews the most widely used of these tools for both manual and automatic generation of annotations on multimodal data.
This paper discusses the semi-formal language of mathematics and presents the Naproche CNL, a controlled natural language for mathematical authoring. Proof Representation Structures, an adaptation of Discourse Representation Structures, are used to represent the semantics of texts written in the Naproche CNL. We discuss how the Naproche CNL can be used in formal mathematics, and present our prototypical Naproche system, a computer program for parsing texts in the Naproche CNL and checking the proofs in them for logical correctness.
This paper analyses one specific conversational practice of formulation
called ‘notionalization’. It consists in the transformation of a description by a prior
speaker into a categorization by the next speaker. Sequences of this kind are a
‘‘natural laboratory’’ for studying the differences between descriptions and categorizations
regarding their semantic, interactional, and rhetorical properties:
Descriptive/narrative versions are often vague and tentative, multi unit turns,
which are temporalized and episodic, offering a lot of contingent, situational,
and indexical detail.
Notionalizations turn them into condensed, abstract, timeless, and often
agentless categorizations expressed by a noun (phrase) within one turn
constructional unit (TCU).
Drawing on audio- and video-taped German data from various types of interaction,
the paper focuses on one particular practice of notionalization, the formulation
of purportedly common ground by TCUs prefaced with the connective also.
The paper discusses their turn-constructional and morphological properties, pointing
out affinities of notionalization with language for special purposes. Notionalizations
are used for reducing detail and for topical closure. They provide grounds for
emergent keywords, which can be reused to re-contextualize topical issues and
interactional histories efficiently. Notionalizations are powerful means for accomplishing
intersubjectivity while pursuing (sometimes one-sided) practical relevancies
at the same time. Their inevitably perspective design thus may lead to re-open
the issue they were deemed to settle. The paper closes with an outlook to other
practices of notionalization, pointing to dimensions of interactionally relevant
variation and commonalities.
As an Introduction to the Special Issue on "Formulation, generalization,
and abstraction in interaction,’’ this paper discusses key problems of a conversation
analytic (CA) approach to semantics in interaction. Prior research in CA and
Interactional Linguistics has only rarely dealt with issues of linguistic meaning in
interaction. It is argued that this is a consequence of limitations of sequential
analysis to capture meaning in interaction. While sequential analysis remains the
encompassing methodological framework, it is suggested that it needs to be complemented
by analyzing semantic relationships between choices of formulation in
the interaction, ethnography, and structural techniques of comparing selected
options with possible alternatives. The paper describes the methodological approach
taken to interactional semantics by the papers in the Special Issue, which analyse
practices of generalization and abstraction in interaction as they are accomplished
by formulations of prior versions of reference and description.
Mock fiction is a genre of humorous, fictional narratives. It is pervasive in adolescents’ peer-group interaction. Building on a corpus of informal peer-group interaction among 14 to 17 year-old German adolescents, it is shown how mock fiction is used to sanction identity-claims of peer-group co-members that are taken to be inadequate by the teller of a mock fiction. Mock fiction exposes and ridicules those claims by fictional exaggeration. Mock fiction is an indirect, yet sometimes even highly abusive means for criticizing and negotiating identities and statuses of peer-group members. The analysis shows how mock fiction is collaboratively produced, how it is used to convey criticism and to negotiate social norms indirectly, and how, in addition, it allows for performative self-positioning of the tellers as skilled, entertaining tellers and socio-psychological diagnosticians.
Question Answering Systems for retrieving information from Knowledge Graphs (KG) have become a major area of interest in recent years. Current systems search for words and entities but cannot search for grammatical phenomena. The purpose of this paper is to present our research on developing a QA System that answers natural language questions about German grammar.
Our goal is to build a KG which contains facts and rules about German grammar, and is also able to answer specific questions about a concrete grammatical issue. An overview of the current research in the topic of QA systems and ontology design is given and we show how we plan to construct the KG by integrating the data in the grammatical information system Grammis, hosted by the Leibniz-Institut für Deutsche Sprache (IDS). In this paper, we describe the construction of the initial KG, sketch our resulting graph, and demonstrate the effectiveness of such an approach. A grammar correction component will be part of a later stage. The paper concludes with the potential areas for future research.
Different Views on Markup
(2010)
In this chapter, two different ways of grouping information represented in document markup are examined: annotation levels, referring to conceptual levels of description, and annotation layers, referring to the technical realisation of markup using e.g. document grammars. In many current XML annotation projects, multiple levels are integrated into one layer, often leading to the problem of having to deal with overlapping hierarchies. As a solution, we propose a framework for XML-based multiple, independent XML annotation layers for one text, based on an abstract representation of XML documents with logical predicates. Two realisations of the abstract representation are presented, a Prolog fact base format together with an application architecture, and a specification for XML native databases. We conclude with a discussion of projects that have currently adopted this framework.