Proceedings of the LREC 2022 Workshop on Challenges in the Management of Large Corpora (CMLC-10 2022). Marseille, 20 June 2022
- Contents: 1. Vasile Pais, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Roxana Micu and Carol Luca Gasan: Challenges in Creating a Representative Corpus of Romanian Micro-Blogging Text. Pp. 1-7 2. Modest von Korff: Exhaustive Indexing of PubMed Records with Medical Subject Headings. Pp. 8-15 3. Luca Brigada Villa: UDeasy: a Tool for Querying Treebanks in CoNLL-U Format. Pp. 16-19 4. Nils Diewald: Matrix and Double-Array Representations for Efficient Finite State Tokenization. Pp. 20-26 5. Peter Fankhauser and Marc Kupietz: Count-Based and Predictive Language Models for Exploring DeReKo. Pp. 27-31 6. Hanno Biber: “The word expired when that world awoke.” New Challenges for Research with Large Text Corpora and Corpus-Based Discourse Studies in Totalitarian Times. Pp. 32-35
URN: | urn:nbn:de:bsz:mh39-111115 |
---|---|
URL: | http://www.lrec-conf.org/proceedings/lrec2022/workshops/CMLC10/index.html |
ISBN: | 979-10-95546-83-2 |
Publisher: | European Language Resources Association (ELRA) |
Place of publication: | Paris |
Editor: | Piotr Bański, Adrien Barbaresi, Simon Clematide, Marc Kupietz, Harald Lüngen |
Document Type: | Book |
Language: | English |
Year of first Publication: | 2022 |
Date of Publication (online): | 2022/07/01 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
Tag: | corpus architecture; language modelling; large corpora |
GND Keyword: | Daten; Datenanalyse; Datenmanagement; Datenqualität; Datensammlung; Datensatz; Korpus <Linguistik> |
Page Number: | viii; 36 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Korpuslinguistik |
Program areas: | G1: Beschreibung und Erschließung Grammatischen Wissens |
Program areas: | S1: Korpuslinguistik |
Licence (English): | Creative Commons - Attribution-NonCommercial 4.0 International |