Making Non-Normalized Content Retrievable – A Tagging Pipeline for a Corpus of Expert-Layperson Texts
- Conventional terminology resources reach their limits when it comes to automatic content classification of texts in the domain of expertlayperson communication. This can be attributed to the fact that (non-normalized) language usage does not necessarily reflect the terminological elements stored in such resources. We present several strategies to extend a terminological resource with term-related elements in order to optimize automatic content classification of expert-layperson texts.
| Author: | Christian LangGND, Ngoc Duyen Tanja TuORCiDGND, Laura Zeidler |
|---|---|
| URN: | urn:nbn:de:bsz:mh39-121941 |
| DOI: | https://doi.org/https://doi.org/10.34619/srmk-injj |
| ISBN: | 978-989-54081-5-3 |
| Parent Title (English): | Language, Data and Knowledge 2023 (LDK 2023): Proceedings of the 4th Conference on Language, Data and Knowledge |
| Publisher: | NOVA FCSH - CLUNL |
| Place of publication: | Portugal |
| Editor: | Sara CarvalhoORCiD, Anas Fahad KhanORCiD, Ana Ostroški AnićORCiDGND, Blerina SpahiuORCiD, Jorge GraciaORCiDGND, John P. McCraeORCiDGND, Dagmar GromannORCiD, Barbara HeinischORCiDGND, Ana SalgadoORCiD |
| Document Type: | Part of a Book |
| Language: | English |
| Year of first Publication: | 2023 |
| Date of Publication (online): | 2023/10/23 |
| Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) |
| Publicationstate: | Veröffentlichungsversion |
| Reviewstate: | Peer-Review |
| GND Keyword: | Experte; Kommunikation; Laie; Sprachgebrauch; Terminologie; Text |
| First Page: | 239 |
| Last Page: | 244 |
| DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
| Open Access?: | ja |
| BDSL-Classification: | Grammatik |
| Leibniz-Classification: | Sprache, Linguistik |
| Linguistics-Classification: | Grammatikforschung |
| Program areas: | G2: Sprachinformationssysteme |
| Licence (English): | Creative Commons - Attribution 4.0 International |


