Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting

Kavli Affiliate: Feng Yuan

| First 5 Authors: Wanjin Feng, Wanjin Feng, , ,

| Summary:

In the era of increasingly complex AI models for time series forecasting,
progress is often measured by marginal improvements on benchmark leaderboards.
However, this approach suffers from a fundamental flaw: standard evaluation
metrics conflate a model’s performance with the data’s intrinsic
unpredictability. To address this pressing challenge, we introduce a novel,
predictability-aligned diagnostic framework grounded in spectral coherence. Our
framework makes two primary contributions: the Spectral Coherence
Predictability (SCP), a computationally efficient ($O(Nlog N)$) and
task-aligned score that quantifies the inherent difficulty of a given
forecasting instance, and the Linear Utilization Ratio (LUR), a
frequency-resolved diagnostic tool that precisely measures how effectively a
model exploits the linearly predictable information within the data. We
validate our framework’s effectiveness and leverage it to reveal two core
insights. First, we provide the first systematic evidence of "predictability
drift", demonstrating that a task’s forecasting difficulty varies sharply over
time. Second, our evaluation reveals a key architectural trade-off: complex
models are superior for low-predictability data, whereas linear models are
highly effective on more predictable tasks. We advocate for a paradigm shift,
moving beyond simplistic aggregate scores toward a more insightful,
predictability-aware evaluation that fosters fairer model comparisons and a
deeper understanding of model behavior.

| Search Query: ArXiv Query: search_query=au:”Feng Yuan”&id_list=&start=0&max_results=3