Making Non-Normalized Content Retrievable – A Tagging Pipeline for a Corpus of Expert-Layperson Texts
- Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
Author: | Christian LangGND, Ngoc Duyen Tanja TuORCiDGND, Laura Zeidler |
---|---|
URN: | urn:nbn:de:bsz:mh39-121941 |
DOI: | https://doi.org/https://doi.org/10.34619/srmk-injj |
ISBN: | 978-989-54081-5-3 |
Parent Title (English): | Language, Data and Knowledge 2023 (LDK 2023): Proceedings of the 4th Conference on Language, Data and Knowledge |
Publisher: | NOVA FCSH - CLUNL |
Place of publication: | Portugal |
Editor: | Sara Carvalho, Anas Fahad Khan, Ana Ostroški Anić, Blerina Spahiu, Jorge Gracia, John P. McCrae, Dagmar Gromann, Barbara Heinisch, Ana Salgado |
Document Type: | Part of a Book |
Language: | English |
Year of first Publication: | 2023 |
Date of Publication (online): | 2023/10/23 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
Publicationstate: | Veröffentlichungsversion |
Reviewstate: | Peer-Review |
GND Keyword: | Experte; Kommunikation; Laie; Sprachgebrauch; Terminologie; Text |
First Page: | 239 |
Last Page: | 244 |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
BDSL-Classification: | Grammatik |
Leibniz-Classification: | Sprache, Linguistik |
Linguistics-Classification: | Grammatikforschung |
Program areas: | G2: Sprachinformationssysteme |
Licence (English): | ![]() |