TY  - CHAP
U1  - Konferenzveröffentlichung
A1  - Graën, Johannes
A1  - Clematide, Simon
ED  - Bański, Piotr
ED  - Biber, Hanno
ED  - Breiteneder, Evelyn
ED  - Kupietz, Marc
ED  - Lüngen, Harald
ED  - Witt, Andreas
T1  - Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora
T2  - Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), Lancaster, 20 July 2015
N2  - The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as partof- speech tagging, lemmatisation, chunking, and dependency parsing facilitate precise querying of linguistic properties and can be used to extend word alignment to sub-sentential groups. Such highly interconnected data is stored in a relational database to allow for efficient retrieval and linguistic data mining, which may include the statistics-based selection of good example sentences. The varying information needs of contrastive linguists require a flexible linguistic query language for ad hoc searches. Such queries in the format of generalised treebank query languages will be automatically translated into SQL queries.
KW  - Korpus <Linguistik>
KW  - Parallel corpora
KW  - Corpus annotation
KW  - Corpus technology
KW  - Corpus query language
KW  - Large corpora
KW  - Korpustechnologie
KW  - Annotation
KW  - Datenbanksystem
Y1  - 2015
U6  - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-38348
UN  - https://nbn-resolving.org/urn:nbn:de:bsz:mh39-38348
SP  - 15
EP  - 20
PB  - Institut für Deutsche Sprache
CY  - Mannheim
ER  -