OPUS 4 | Bochumer Linguistische Arbeitsberichte

Bochumer Linguistische Arbeitsberichte

Bochum: Ruhr-Universität Bochum

2 search hits

1 to 2

Sort by

Brown clustering for unlexicalized parsing (2016)

Dakota, Daniel

Brown clustering has been used to help increase parsing performance for morphologically rich languages. However, much of the work has focused on using clustering techniques to replace terminal nodes or as a feature for parsing. Instead, we choose to examine how effectively Brown clustering is for unlexicalized parsing by creating data-driven POS tagsets which are then used with the Berkeley parser. We investigate cluster sizes as well as on what information (e.g. words vs. lemmas) clustering will yield the best parser performance. Our results approach the current state of the art results for the German T¨uBa-D/Z treebank when using parser internal tagging.

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN (2016)

Lüngen, Harald ; Beißwenger, Michael ; Ehrhardt, Eric ; Herold, Axel ; Storrer, Angelika

We introduce our pipeline to integrate CMC and SM corpora into the CLARIN-D corpus infrastructure. The pipeline was developed by transforming an existing CMC corpus, the Dortmund Chat Corpus, into a resource conforming to current technical and legal standards. We describe how the resource has been prepared and restructured in terms of TEI encoding, linguistic annotations, and anonymisation. The output is a CLARIN-conformant resource integrated in the CLARIN-D research infrastructure.

1 to 2

Open Access

Bochumer Linguistische Arbeitsberichte

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

2 search hits