Lecture Notes in Computer Science
Refine
Document Type
- Conference Proceeding (7)
- Part of a Book (6)
Has Fulltext
- yes (13)
Keywords
- Computerlinguistik (5)
- Natürliche Sprache (5)
- Computer-Mediated Communication (2)
- Corpus Linguistics (2)
- Decision Trees (2)
- Deutsch (2)
- Food item (2)
- Formale Semantik (2)
- Genitive Classification (2)
- Grammar (2)
Publicationstate
- Postprint (9)
- Zweitveröffentlichung (3)
Reviewstate
- Peer-Review (5)
- (Verlags)-Lektorat (3)
- Peer-review (1)
Publisher
- Springer (12)
- Berlin [u.a.] (1)
9103
We examine the combination of pattern-based and distributional similarity for the induction of semantic categories. Pattern-based methods are precise and sparse while distributional methods have a higher recall. Given these particular properties we use the prediction of distributional methods as a back-off to pattern-based similarity. Since our pattern-based approach is embedded into a semi-supervised graph clustering algorithm, we also examine how distributional information is best added to that classifier. Our experiments are carried out on 5 different food categorization tasks.
10415
We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the orthographic realizations seem to be linked to the degree of expressivity.
8105
8105
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
8105
Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) position themselves between orality and literacy, and beyond that provide insight into the impact of “new”, mainly internet-based media on language behaviour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine learning algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.
898
This paper deals with multiword lexemes (MWLs), focussing on two types of verbal MWLs: verbal idioms and support verb constructions. We discuss the characteristic properties of MWLs, namely nonstandard compositionality, restricted substitutability of components, and restricted morpho-syntactic flexibility, and we show how these properties may cause serious problems during the analysis, generation, and transfer steps of machine translation systems. In order to cope with these problems, MT lexicons need to provide detailed descriptions of MWL properties. We list the types of information which we consider the necessary minimum for a successful processing of MWLs, and report on some feasibility studies aimed at the automatic extraction of German verbal multiword lexemes from text corpora and machine-readable dictionaries.
11762
Question Answering Systems for retrieving information from Knowledge Graphs (KG) have become a major area of interest in recent years. Current systems search for words and entities but cannot search for grammatical phenomena. The purpose of this paper is to present our research on developing a QA System that answers natural language questions about German grammar.
Our goal is to build a KG which contains facts and rules about German grammar, and is also able to answer specific questions about a concrete grammatical issue. An overview of the current research in the topic of QA systems and ontology design is given and we show how we plan to construct the KG by integrating the data in the grammatical information system Grammis, hosted by the Leibniz-Institut für Deutsche Sprache (IDS). In this paper, we describe the construction of the initial KG, sketch our resulting graph, and demonstrate the effectiveness of such an approach. A grammar correction component will be part of a later stage. The paper concludes with the potential areas for future research.
11456
German subjectively veridical sicher sein ‘be certain’ can embed ob-clauses in negative contexts, while subjectively veridical glauben ‘believe’ and nonveridical möglich sein ‘be possible’ cannot. The Logical Form of F isn’t certain if M is in Rome is regarded as the negated disjunction of two sentences ¬(cf σ ∨ cf ¬σ) or ¬cf σ ∧ ¬cf ¬σ. Be certain can have this LF because ¬cf σ and ¬cf ¬σ are compatible and nonveridical. Believe excludes this LF because ¬bf σ and ¬bf ¬σ are incompatible in a question-under-discussion context. It follows from this incompatibility and from the incompatibility of bf σ and bf ¬σ that bf ¬σ and ¬bf σ are equivalent. Therefore believe cannot be nonveridical. Be possible doesn’t allow the LF either. Similar to believe, ¬pf σ and ¬pf ¬σ are incompatible. But unlike believe, pf σ and pf ¬σ are compatible.
11799
The demo presents a minimalist, off-the-shelf AND tool which provides a fundamental AND operation, the comparison of two publications with ambiguous authors, as an easily accessible HTTP interface. The tool implements this operation using standard AND functionality, but puts particular emphasis on advanced methods from natural language processing (NLP) for comparing publication title semantics.
10450
We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.