OPUS 4 | Search

Refine

Has Fulltext

yes (46) (remove)

46 search hits

1 to 10

Sort by

"Wir wissen nicht, wie wir mit diesen Kindern reden sollen". Forschung für eine kinderrechtsbasierte Gesprächspraxis (2015)

Reitemeier, Ulrich ; Schulze, Heidrun ; Witek, Kathrin

'Diskurs - Semiotisch'. Bericht über die 4. Jahrestagung des Netzwerks 'Diskurs - interdisziplinär' am Institut für Deutsche Sprache vom 4.-6. Dezember 2014 (2015)

Schnedermann, Theresa

Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora (2015)

Graën, Johannes ; Clematide, Simon

The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as partof- speech tagging, lemmatisation, chunking, and dependency parsing facilitate precise querying of linguistic properties and can be used to extend word alignment to sub-sentential groups. Such highly interconnected data is stored in a relational database to allow for efficient retrieval and linguistic data mining, which may include the statistics-based selection of good example sentences. The varying information needs of contrastive linguists require a flexible linguistic query language for ad hoc searches. Such queries in the format of generalised treebank query languages will be automatically translated into SQL queries.

CoRoLa Starts Blooming – An update on the Reference Corpus of Contemporary Romanian Language (2015)

Tufiș, Dan ; Barbu Mititelu, Verginica ; Irimia, Elena ; Dumitrescu, Ștefan Daniel ; Boroș, Tiberiu ; Teodorescu, Horia Nicolai ; Cristea, Dan ; Scutelnicu, Andrei ; Bolea, Cecilia ; Moruz, Alex ; Pistol, Laura

This article reports on the on-going CoRoLa project, aiming at creating a reference corpus of contemporary Romanian (from 1945 onwards), opened for online free exploitation by researchers in linguistics and language processing, teachers of Romanian, students. We invest serious efforts in persuading large publishing houses and other owners of IPR on relevant language data to join us and contribute the project with selections of their text and speech repositories. The CoRoLa project is coordinated by two Computer Science institutes of the Romanian Academy, but enjoys cooperation of and consulting from professional linguists from other institutes of the Romanian Academy. We foresee a written component of the corpus of more than 500 million word forms, and a speech component of about 300 hours of recordings. The entire collection of texts (covering all functional styles of the language) will be pre-processed and annotated at several levels, and also documented with standardized metadata. The pre-processing includes cleaning the data and harmonising the diacritics, sentence splitting and tokenization. Annotation will include morpho-lexical tagging and lemmatization in the first stage, followed by syntactic, semantic and discourse annotation in a later stage.

Das Institut für Deutsche Sprache im Jahr 2014 : Jahresbericht (2015)

Das Projekt "European Network of E-Lexicography". Lexikographie aus europäischer Perspektive (2015)

Vietze, Oda

Denken und Sprechen in Oppositionen (2015)

Ulrich, Winfried

Der Tanz um das Verb (2015)

Engel, Ulrich

Des Iraks, des Irakes oder des Irak - von Sprachzweifeln und Sprachveriation (2015)

Konopka, Marek

Deutsch in Ost und West. Eine Bestandsaufnahme zum 25. Jahrestag des Mauerfalls (2015)

Plewnia, Albrecht

1 to 10

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Publicationstate

Reviewstate

Publisher

46 search hits