Skip to content

The official PyTorch implementation of our NeurIPS'25 paper: Synthetic Series-Symbol Data Generation for Time Series Foundation Models.

License

Notifications You must be signed in to change notification settings

wwhenxuan/SymTime

Repository files navigation

SymTime NeurIPS 2025

This code is the official PyTorch implementation of our NeurIPS'25 paper: Synthetic Series-Symbol Data Generation for Time Series Foundation Models.

✨ Introduction

Due to issues such as data privacy and acquisition difficulties, existing large-scale time series datasets face severe data shortages and imbalanced distribution compared to images and natural language. Foundation models pre-trained on these datasets will have certain prediction biases, reducing their generalization and robustness.

Inspired by complex dynamic system theories, we design a series-symbol data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic expressions. To leverage series-symbol data pairs with strong correlations, we develop SymTime, a pre-trained foundation model for enhancing time series representation using symbolic information, which demonstrates competitive performance across five major TSA tasks, rivaling foundation models pre-trained on real-world datasets.

SymTime

🧭 Quickstart

Installation

First create a Python virtual environment (preferably version 3.10.15), then install the required dependencies by running the following command:

pip install -r requirements.txt

Data Preparation

SymTime relies on a large-scale series-symbol bimodal dataset generated by S2Generator during pre-training. You can generate the data required for pre-training by executing the following script:

bash ./scripts/s2generator.sh

For the fine-tuning datasets, you can load from OneDrive or BaiduCloud. Then place the downloaded datasets under the folder ./datasets.

Model Pre-Train and Fine-Tune

Once you have generated enough time series data using S2Generator, you can pre-train SymTime by executing our pre-training script:

bash ./scripts/SymTime_pretrain.sh

If you want to skip the time-consuming pre-training phase, you can directly download our pre-trained model parameters from OneDrive or BaiduCloud and put them under ./models/params/ for fine-tuning on downstream tasks:

# Long-term time series forecasting
bash ./scripts/long_term_forecasting/SymTime_ECL.sh

# Short-term time series forecasting
bash ./scripts/short_term_forecasting/SymTime_M4.sh

# Time series classification
bash ./scripts/classification/SymTime_EthanolConcentration.sh

# Time series imputation
bash ./scripts/imputation/SymTime_ECL.sh

# Time series anomaly detection
bash ./scripts/anomaly_detection/MSL.sh

📊 Results

Main Results

Compared with other models for general time series analysis tasks, SymTime, which has been pre-trained with mask modeling and cross-modal contrastive learning, can achieve SOTA results in fine-tuning of downstream tasks and has lower model complexity.

main

Benchmark Results

We present the experimental results on the TimesNet benchmark. Compared with the current more advanced models, SymTime can achieve better experimental results.

benchmark

Dataset and Representation Learning

We generate a large amount of series-symbol bimodal data using S2Generator for pre-training of mask modeling and contrastive learning. Therefore, we first verify the representation coverage of the synthetic data compared with real-world time series dataset.

coverage

Then, we visualize the representation space of time series encoder (a)(b) and symbolic expression encoder (c)(d) in SymTime before and after pre-training. The paired time series and symbolic expressions form distinct clustering features, demonstrating the effectiveness of our pre-training paradigm.

representation

🎓 Citation

If you find this code useful, please cite our paper.

@misc{wang2025syntheticseriessymboldatageneration,
      title={Synthetic Series-Symbol Data Generation for Time Series Foundation Models}, 
      author={Wenxuan Wang and Kai Wu and Yujian Betterest Li and Dan Wang and Xiaoyu Zhang},
      year={2025},
      eprint={2510.08445},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.08445}, 
}

🎖️ Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

🤗 Contact

If you have any questions or are interested in our view on the complex dynamics of time series, feel free to contact:

About

The official PyTorch implementation of our NeurIPS'25 paper: Synthetic Series-Symbol Data Generation for Time Series Foundation Models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published