A recent study published in
Engineering presents a novel framework named ERQA (mEdical knowledge Retrieval and Question-Answering), which is powered by an enhanced large language model (LLM). This framework aims to improve medical knowledge discovery by integrating a semantic vector database and a curated literature repository.
The development of LLMs has significantly advanced text processing, leading to their wide application in various fields, including biomedicine. However, issues like hallucination and confabulation in LLMs pose challenges, especially in high-accuracy-demanding medical applications. To address these, the researchers constructed the ERQA framework.
The ERQA framework is based on Llama2 and undergoes two crucial steps: incremental pretraining on biomedical text and fine-tuning with sophisticated prompts. This process enables the model to incorporate domain-specific knowledge and handle various medical tasks, such as question classification, reconstruction, abstract summarization, and literature-based QA. The literature database in ERQA stores original textual content and metadata, while the semantic vector database supports semantic-based retrieval, enhancing the efficiency of information retrieval.
To evaluate the performance of ERQA, the researchers used the pandemic and TripClick datasets. They compared ERQA with several semantic baselines and medical LLMs. In literature retrieval, ERQA-7B and ERQA-13B showed high retrieval accuracy on both datasets. For example, on the pandemic dataset, ERQA-13B achieved state-of-the-art retrieval metrics, with an NDCG@10 of 0.297, Recall@10 of 0.347, and MRR of 0.370. In abstract summarization, ERQA-13B outperformed other models on both datasets, achieving high ROUGE scores. Regarding literature-based QA, ERQA models, especially ERQA-13B, outperformed the baselines, with ERQA-13B achieving a BLEU-1 score of 7.851 on the pandemic dataset. A human evaluation also showed that larger ERQA models performed better in terms of coherence, consistency, and user satisfaction, though there were still some limitations.
The researchers also conducted ablation studies, which indicated that the combined effect of fine-tuning and vector database integration significantly enhanced the performance of ERQA. However, the ERQA model still faces challenges. Retrieving articles that fully meet user requirements can be difficult, and issues like text segmentation and layer selection for embeddings need to be optimized. The hallucination problem in LLMs also affects the reliability of the system.
The ERQA framework shows promise in biomedical knowledge discovery. The researchers plan to further improve the model by incorporating larger-scale biomedical literature datasets and adopting additional evaluation metrics.
The paper “Toward a Large Language Model-Driven Medical Knowledge Retrieval and QA System: Framework Design and Evaluation,” is authored by Yuyang Liu, Xiaoying Li, Yan Luo, Jinhua Du, Ying Zhang, Tingyu Lv, Hao Yin, Xiaoli Tang, Hui Liu. Full text of the open access paper:
https://doi.org/10.1016/j.eng.2025.02.010. For more information about
Engineering, visit the website at
https://www.sciencedirect.com/journal/engineering.