OPUS 4 | Search

Refine

Has Fulltext

yes (2)

2 search hits

1 to 2

Sort by

Year
Year
Title
Title
Author
Author

Cleaning the Europarl Corpus for Linguistic Applications (2014)

Graën, Johannes ; Batinić, Dolores ; Volk, Martin

We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.

LeGeDe – towards a corpus-based lexical resource of spoken German (2017)

Möhrs, Christine ; Meliss, Meike ; Batinić, Dolores

This paper gives an insight into the basic concepts for a corpus-based lexical resource of spoken German, which is being developed by the project "The Lexicon of Spoken German"(Lexik des gesprochenen Deutsch, LeGeDe) at the "Institute for the German Language" (Institut für Deutsche Sprache, IDS) in Mannheim. The focus of the paper is on initial ideas of semi-automatic and automatic resources that assist the quantitative analysis of the corpus data for the creation of dictionary content. The work is based on the "Research and Teaching Corpus of Spoken German" (Forschungs- und Lehrkorpus Gesprochenes Deutsch, FOLK).

1 to 2

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

2 search hits