FinS-Pilot: A Benchmark for Online Financial System – Kavli Institute Pre-Print Publications

Kavli Affiliate: Feng Wang

| First 5 Authors: Feng Wang, Yiding Sun, Jiaxin Mao, Wei Xue, Danqing Xu

| Summary:

Large language models (LLMs) have demonstrated remarkable capabilities across
various professional domains, with their performance typically evaluated
through standardized benchmarks. However, the development of financial RAG
benchmarks has been constrained by data confidentiality issues and the lack of
dynamic data integration. To address this issue, we introduces FinS-Pilot, a
novel benchmark for evaluating RAG systems in online financial applications.
Constructed from real-world financial assistant interactions, our benchmark
incorporates both real-time API data and structured text sources, organized
through an intent classification framework covering critical financial domains
such as equity analysis and macroeconomic forecasting. The benchmark enables
comprehensive evaluation of financial assistants’ capabilities in handling both
static knowledge and time-sensitive market information. Through systematic
experiments with multiple Chinese leading LLMs, we demonstrate FinS-Pilot’s
effectiveness in identifying models suitable for financial applications while
addressing the current gap in specialized evaluation tools for the financial
domain. Our work contributes both a practical evaluation framework and a
curated dataset to advance research in financial NLP systems. The code and
dataset are accessible on
GitHubfootnote{https://github.com/PhealenWang/financial_rag_benchmark}.

| Search Query: ArXiv Query: search_query=au:”Feng Wang”&id_list=&start=0&max_results=3