Automatic question answering for the linguistic domain – An evaluation of LLM knowledge base extension with RAG
- We investigate the extent to which Retrieval Augmented Generation improves the quality of Large Language Models’ answers to technical questions in the field of linguistics—a domain known for its broad terminological inventory and theory-dependent use of technical terms. Furthermore, this application is not only about terminological information on language, but also about information on its well-formedness. We present the results of an empirical evaluation of automatically generated answers based on authentic data from a language consulting service, with special emphasis on different question types.
Author: | Christian LangGND, Roman SchneiderORCiDGND, Ngoc Duyen Tanja TuORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:mh39-128138 |
DOI: | https://doi.org/10.1007/978-3-031-70242-6_16 |
ISBN: | 978-3-031-70242-6 |
ISSN: | 1611-3349 |
Parent Title (English): | Natural Language Processing and Information Systems. 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part II |
Series (Serial Number): | Lecture Notes in Computer Science (14763) |
Publisher: | Springer |
Place of publication: | Cham |
Editor: | Amon Rapp, Luigi Di Caro, Farid Meziane, Vijayan Sugumaran |
Document Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2024 |
Date of Publication (online): | 2024/09/20 |
Publishing Institution: | Leibniz-Institut für Deutsche Sprache (IDS) [Zweitveröffentlichung] |
Publicationstate: | Zweitveröffentlichung |
Publicationstate: | Postprint |
Reviewstate: | Peer-Review |
Tag: | domain specificity; large language model; quality evaluation; question answering; retrieval augmented generation |
GND Keyword: | Antwort; Automatische Sprachanalyse; Computerlinguistik; Großes Sprachmodell; Terminologie |
First Page: | 161 |
Last Page: | 171 |
Note: | This version of the contribution has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at https://doi.org/10.1007/978-3-031-70242-6_16. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms. |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Open Access?: | ja |
Linguistics-Classification: | Computerlinguistik |
Program areas: | Grammatik |
Licence (German): | Urheberrechtlich geschützt |