OPUS 4 | Lecture Notes in Computer Science

Lecture Notes in Computer Science

Refine

Has Fulltext

yes (4)

4 search hits

1 to 4

Sort by

898

Description and acquisition of multiword lexemes (1995)

Schwall, Ulrike ; Storrer, Angelika

This paper deals with multiword lexemes (MWLs), focussing on two types of verbal MWLs: verbal idioms and support verb constructions. We discuss the characteristic properties of MWLs, namely nonstandard compositionality, restricted substitutability of components, and restricted morpho-syntactic flexibility, and we show how these properties may cause serious problems during the analysis, generation, and transfer steps of machine translation systems. In order to cope with these problems, MT lexicons need to provide detailed descriptions of MWL properties. We list the types of information which we consider the necessary minimum for a successful processing of MWLs, and report on some feasibility studies aimed at the automatic extraction of German verbal multiword lexemes from text corpora and machine-readable dictionaries.

8105

Decision Tree-Based Evaluation of Genitive Classification – An Empirical Study on CMC and Text Corpora. Language Processing and Knowledge in the Web (2013)

Hansen, Sandra ; Schneider, Roman

Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) Position themselves between orality and literacy, and beyond that provide in- sight into the impact of "new", mainly intemet-based media on language beha- viour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine leaming algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.

8105

Decision tree-based evaluation of genitive classification. An empirical study on CMC and text corpora (2013)

Hansen, Sandra ; Schneider, Roman

Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) position themselves between orality and literacy, and beyond that provide insight into the impact of “new”, mainly internet-based media on language behaviour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine learning algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.

8105

Decision tree-based evaluation of genitive classification : an empirical study on CMC and text corpora (2013)

Hansen, Sandra ; Schneider, Roman

1 to 4

Open Access

Lecture Notes in Computer Science

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

4 search hits