Matbench Discovery — A framework to evaluate machine learning crystal stability predictions

Kavli Affiliate: Kristin A. Persson

| First 5 Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng

| Summary:

The rapid adoption of machine learning (ML) in domain sciences necessitates
best practices and standardized benchmarking for performance evaluation. We
present Matbench Discovery, an evaluation framework for ML energy models,
applied as pre-filters for high-throughput searches of stable inorganic
crystals. This framework addresses the disconnect between thermodynamic
stability and formation energy, as well as retrospective vs. prospective
benchmarking in materials discovery. We release a Python package to support
model submissions and maintain an online leaderboard, offering insights into
performance trade-offs. To identify the best-performing ML methodologies for
materials discovery, we benchmarked various approaches, including random
forests, graph neural networks (GNNs), one-shot predictors, iterative Bayesian
optimizers, and universal interatomic potentials (UIP). Our initial results
rank models by test set F1 scores for thermodynamic stability prediction:
EquiformerV2 + DeNS > Orb > SevenNet > MACE > CHGNet > M3GNet > ALIGNN > MEGNet
> CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi fingerprint random forest.
UIPs emerge as the top performers, achieving F1 scores of 0.57-0.82 and
discovery acceleration factors (DAF) of up to 6x on the first 10k stable
predictions compared to random selection. We also identify a misalignment
between regression metrics and task-relevant classification metrics. Accurate
regressors can yield high false-positive rates near the decision boundary at 0
eV/atom above the convex hull. Our results demonstrate UIPs’ ability to
optimize computational budget allocation for expanding materials databases.
However, their limitations remain underexplored in traditional benchmarks. We
advocate for task-based evaluation frameworks, as implemented here, to address
these limitations and advance ML-guided materials discovery.

| Search Query: ArXiv Query: search_query=au:”Kristin A. Persson”&id_list=&start=0&max_results=3

Read More