Time Series Forecasting: Foundation Models, Decomposition, and the Quest for Generalizable Intelligence
Latest 50 papers on time series forecasting: Nov. 10, 2025
The Era of Foundation Models and Decomposition in Time Series Forecasting
Time series forecasting (TSF) has rapidly evolved from reliance on statistical models to embracing complex deep learning architectures, mirroring the advancements seen in NLP and Vision. The sheer diversity, stochasticity, and irregularity of real-world time series—from financial markets to IoT sensor readings and neural activity—present unique challenges that traditional models often fail to conquer. The current landscape is witnessing a seismic shift toward Foundation Models and sophisticated Decomposition Strategies, aiming to build robust, generalizable, and resource-efficient predictive systems.
This digest synthesizes recent breakthroughs that address the core hurdles in TSF: generalization, interpretability, data heterogeneity, and computational efficiency.
The Big Ideas & Core Innovations
Recent research coalesces around three major themes: building massive, zero-shot Foundation Models; decomposing time series to conquer complexity; and improving model robustness against uncertainty and poor data quality.
1. The Foundation Model Wave: Zero-Shot and Lightweight Giants
The ambition to create general-purpose TSF models is finally taking shape. Datadog AI Research introduced TOTO, detailed in This Time is Different: An Observability Perspective on Time Series Foundation Models. TOTO, a zero-shot forecasting model with 151 million parameters, achieves state-of-the-art performance, especially in observability-oriented tasks, setting a new benchmark for large-scale application.
However, not all foundation models must be massive. The trend toward efficiency is championed by models like TiRex (TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning) from NXAI GmbH, which utilizes an xLSTM architecture and Contiguous Patch Masking (CPM) to mitigate error accumulation and achieve reliable zero-shot prediction. Furthermore, the lightweight SEMPO (SEMPO: Lightweight Foundation Models for Time Series Forecasting) from Beijing Institute of Technology drastically reduces model size and pre-training data while maintaining strong zero-shot generalization, proving that efficiency is not always sacrificed for performance.
For those seeking simplicity, TempoPFN (TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting) from the University of Freiburg demonstrates that high-performing zero-shot forecasting can be achieved using only linear RNNs with synthetic pre-training, challenging the necessity of complex non-linear architectures.
2. Decomposition and Modularity for Robustness
A unifying trend is the meticulous separation and specialized modeling of time series components (trend, seasonality, and noise) to enhance accuracy and interpretability.
- Decomposition-Driven Frameworks: Frameworks like OneCast (OneCast: Structured Decomposition and Modular Generation for Cross-Domain Time Series Forecasting) utilize structured decomposition and modular generation to generalize across heterogeneous domains. Similarly, DBLoss (DBLoss: Decomposition-based Loss Function for Time Series Forecasting) proposes a novel loss function that explicitly models and optimizes seasonal and trend components via decomposition, leading to superior accuracy over traditional Mean Squared Error (MSE).
- Hybrid Architectures: The SST model (SST: Multi-Scale Hybrid Mamba-Transformer Experts for Time Series Forecasting) from the Illinois Institute of Technology cleverly combines Mamba (for long-range patterns) and Transformer (for short-term variations) with a decomposition strategy to eliminate information interference, achieving state-of-the-art results with linear complexity.
3. Tackling Uncertainty and Imperfect Data
Dealing with real-world noise, missing values, and stochasticity requires specialized approaches. For highly variable data, the University of Melbourne’s Stochastic Diffusion (StochDiff) (Stochastic Diffusion: A Diffusion Probabilistic Model for Stochastic Time Series Forecasting) integrates the diffusion process directly into the modeling stage, proving superior for highly stochastic sequences, such as surgical data.
Crucially, when data is incomplete, the CRIB framework introduced in Revisiting Multivariate Time Series Forecasting with Missing Values challenges the conventional ‘imputation-then-prediction’ paradigm. The authors propose a direct-prediction method using the Information Bottleneck principle, outperforming imputation approaches under high missing rates.
Under the Hood: Models, Datasets, & Benchmarks
The recent breakthroughs are driven by novel architectural choices and the introduction of specialized benchmarks:
- Foundation Models & Architectures:
- TOTO (Datadog AI Research): A large-scale, zero-shot Transformer optimized for Observability Metrics.
- TiRex (NXAI GmbH): Leverages xLSTM and Contiguous Patch Masking (CPM) for enhanced state-tracking and long-horizon uncertainty estimation.
- ViTime (City University of Hong Kong): A groundbreaking foundation model that shifts TSF from numerical fitting to binary image-based operations, leveraging vision intelligence and a new data generation process, RealTS, for robust zero-shot prediction (ViTime: Foundation Model for Time Series Forecasting Powered by Vision Intelligence).
- SST: A hybrid Mamba-Transformer expert network with linear complexity.
- InvDec (Hangzhou City University): A hybrid temporal encoder and inverted variate-level decoder that excels in high-dimensional multivariate TSF (e.g., 321 variables in Electricity data) (InvDec: Inverted Decoder for Multivariate Time Series Forecasting with Separated Temporal and Variate Modeling).
- Methodological Innovations:
- SRSNet (East China Normal University): Uses Selective Patching and Dynamic Reassembly to construct more informative representation spaces, yielding state-of-the-art results with minimal complexity (Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective).
- AMRC (Beihang University/NYU): Uses Adaptive Masking Loss to suppress redundant feature learning, revealing that truncating historical data can sometimes improve prediction accuracy (Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency).
- New Benchmarks:
- BOOM: Introduced alongside TOTO, this is the first large-scale benchmark specifically for real-world Observability Metrics.
- SynTSBench (Tsinghua University): A synthetic data-driven framework designed to systematically evaluate TSF model capabilities by isolating temporal features and assessing robustness under irregularities (SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time Series).
- Critiques of current standards are offered in Time Series Foundation Models: Benchmarking Challenges and Requirements, which demands new methodologies to combat test set contamination and information leakage.
Impact & The Road Ahead
These advancements are transforming high-stakes applications. In finance, models like CRISP (Crisis-Resilient Portfolio Management via Graph-based Spatio-Temporal Learning) and DeltaLag (The Hong Kong University of Science and Technology, DeltaLag: Learning Dynamic Lead-Lag Patterns in Financial Markets) are moving beyond static risk models by dynamically detecting market regime shifts and evolving asset relationships, leading to massive Sharpe ratio improvements. In public health, the forecasting system for opioid overdoses developed using simple N-Linear models at the University of Kentucky (Implementation and Assessment of Machine Learning Models for Forecasting Suspected Opioid Overdoses in Emergency Medical Services Data) shows that careful feature engineering and data aggregation can yield crucial, actionable insights.
For the core architecture community, the theory behind Transformer limitations in forecasting (Duke University, Why Do Transformers Fail to Forecast Time Series In-Context?) is a pivotal finding, demonstrating that linear self-attention layers are fundamentally restricted, prompting the shift toward hybrid models like SST and specialized recurrent architectures.
Looking ahead, the road involves rigorous benchmarking against challenges like catastrophic forgetting in federated settings (Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting) and building highly interpretable systems. The push for Explainable AI in Finance (Towards Explainable and Reliable AI in Finance), through prompt-based reasoning and reliability estimators, underscores the industry’s need for models that are not only accurate but also auditable. By continuing to innovate through modular design, advanced decomposition, and theoretically grounded architectures, the field is rapidly progressing toward truly generalizable and reliable time series intelligence.
Share this content:
Post Comment