Kavli Affiliate: Long Zhang
| First 5 Authors: Fei Huang, Fei Huang, , ,
| Summary:
GeoGPT is an open large language model system built to advance research in
the geosciences. To enhance its domain-specific capabilities, we integrated
Retrieval Augmented Generation(RAG), which augments model outputs with relevant
information retrieved from an external knowledge source. GeoGPT uses RAG to
draw from the GeoGPT Library, a specialized corpus curated for geoscientific
content, enabling it to generate accurate, context-specific answers. Users can
also create personalized knowledge bases by uploading their own publication
lists, allowing GeoGPT to retrieve and respond using user-provided materials.
To further improve retrieval quality and domain alignment, we fine-tuned both
the embedding model and a ranking model that scores retrieved passages by
relevance to the query. These enhancements optimize RAG for geoscience
applications and significantly improve the system’s ability to deliver precise
and trustworthy outputs. GeoGPT reflects a strong commitment to open science
through its emphasis on collaboration, transparency, and community driven
development. As part of this commitment, we have open-sourced two core RAG
components-GeoEmbedding and GeoReranker-to support geoscientists, researchers,
and professionals worldwide with powerful, accessible AI tools.
| Search Query: ArXiv Query: search_query=au:”Long Zhang”&id_list=&start=0&max_results=3