OPUS 4 | 430 Deutsch

430 Deutsch

430 Deutsch (130)
431 Schriftsysteme und Phonologie des Deutschen (1)
432 Etymologie des Deutschen (20)
433 Deutsche Wörterbücher (51)
435 Deutsche Grammatik (111)
437 Varianten des Deutschen (121)
438 Gebrauch des Standard-Deutsch (27)
439 Andere germanische Sprachen (40)

2 search hits

1 to 2

Sort by

Merging the trees. Building a morphological treebank for German from two resources (2017)

Steiner, Petra

This paper deals with the creation of the first morphological treebank for German by merging two pre-existing linguistic databases. The first of these is the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished and modernized version. The second resource is GermaNet, a lexical-semantic network which also provides partial markup for compounds. We describe the state of the art and the essential characteristics of both databases and our latest revisions. As the merging involves two data sources with distinct annotation schemes, the derivation of the morphological trees for the unified resource is not trivial. We discuss how we overcome problems with the data and format, in particular how we deal with overlaps and complementary scopes. The resulting database comprises about 100,000 trees whose format can be chosen according to the requirements of the application at hand. In our discussion, we show some future directions for morphological treebanks. The Perl script for the generation of the data from the sources will be made publicly available on our website.

Data-driven identification of German phrasal compounds (2017)

Barbaresi, Adrien ; Hein, Katrin

We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the orthographic realizations seem to be linked to the degree of expressivity.

1 to 2

Open Access

430 Deutsch

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

2 search hits