VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Kavli Affiliate: Zhuo Li

| First 5 Authors: Mouxiang Chen, Lefei Shen, Zhuo Li, Xiaoyun Joy Wang, Jianling Sun

| Summary:

Foundation models have emerged as a promising approach in time series
forecasting (TSF). Existing approaches either repurpose large language models
(LLMs) or build large-scale time series datasets to develop TSF foundation
models for universal forecasting. However, these methods face challenges due to
the severe cross-domain gap or in-domain heterogeneity. This paper explores a
new road to building a TSF foundation model from rich, high-quality natural
images. Our key insight is that a visual masked autoencoder, pre-trained on the
ImageNet dataset, can naturally be a numeric series forecaster. By
reformulating TSF as an image reconstruction task, we bridge the gap between
image pre-training and TSF downstream tasks. Surprisingly, without further
adaptation in the time-series domain, the proposed VisionTS could achieve
superior zero-shot forecasting performance compared to existing TSF foundation
models. With fine-tuning for one epoch, VisionTS could further improve the
forecasting and achieve state-of-the-art performance in most cases. Extensive
experiments reveal intrinsic similarities between images and real-world time
series, suggesting visual models may offer a “free lunch” for TSF and
highlight the potential for future cross-modality research. Our code is
publicly available at https://github.com/Keytoyze/VisionTS.

| Search Query: ArXiv Query: search_query=au:”Zhuo Li”&id_list=&start=0&max_results=3

Read More