Kavli Affiliate: Cheng Peng
| First 5 Authors: Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu
| Summary:
Recent large language models (LLMs) like ChatGPT and LLaMA have shown great
promise in many AI applications. However, their performance on medical tasks is
suboptimal and can be further improved by training on large domain-specific
datasets. This study introduces Me LLaMA, a medical LLM family including
foundation models – Me LLaMA 13/70B and their chat-enhanced versions – Me LLaMA
13/70B-chat, developed through the continual pre-training and instruction
tuning of LLaMA2 using large medical data. Our domain-specific data suite for
training and evaluation, includes a large-scale continual pre-training dataset
with 129B tokens, an instruction tuning dataset with 214k samples, and a
medical evaluation benchmark (MIBE) across six tasks with 14 datasets. Our
extensive evaluation using MIBE shows that Me LLaMA models surpass existing
open-source medical LLMs in zero-shot and few-shot learning and outperform
commercial giants like ChatGPT on 6 out of 8 datasets and GPT-4 in 3 out of 8
datasets. In addition, we empirically investigated the catastrophic forgetting
problem, and our results show that Me LLaMA models outperform other medical
LLMs. Me LLaMA is one of the first and largest open-source foundational LLMs
designed for the medical domain, using both biomedical and clinical data. It
exhibits superior performance across both general and medical tasks compared to
other medical LLMs, rendering it an attractive choice for medical AI
applications. All resources are available at:
https://github.com/BIDS-Xu-Lab/Me-LLaMA.
| Search Query: ArXiv Query: search_query=au:”Cheng Peng”&id_list=&start=0&max_results=3