SymTime NeurIPS 2025

This code is the official PyTorch implementation of our NeurIPS'25 paper: Synthetic Series-Symbol Data Generation for Time Series Foundation Models.

✨ Introduction

Due to issues such as data privacy and acquisition difficulties, existing large-scale time series datasets face severe data shortages and imbalanced distribution compared to images and natural language. Foundation models pre-trained on these datasets will have certain prediction biases, reducing their generalization and robustness.

Inspired by complex dynamic system theories, we design a series-symbol data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic expressions. To leverage series-symbol data pairs with strong correlations, we develop SymTime, a pre-trained foundation model for enhancing time series representation using symbolic information, which demonstrates competitive performance across five major TSA tasks, rivaling foundation models pre-trained on real-world datasets.

🧭 Quickstart

Installation

First create a Python virtual environment (preferably version 3.10.15), then install the required dependencies by running the following command:

pip install -r requirements.txt

Data Preparation

SymTime relies on a large-scale series-symbol bimodal dataset generated by S2Generator during pre-training. You can generate the data required for pre-training by executing the following script:

bash ./scripts/s2generator.sh

For the fine-tuning datasets, you can load from OneDrive or BaiduCloud. Then place the downloaded datasets under the folder ./datasets.

Model Pre-Train and Fine-Tune

Once you have generated enough time series data using S2Generator, you can pre-train SymTime by executing our pre-training script:

bash ./scripts/SymTime_pretrain.sh

If you want to skip the time-consuming pre-training phase, you can directly download our pre-trained model parameters from OneDrive or BaiduCloud and put them under ./models/params/ for fine-tuning on downstream tasks:

# Long-term time series forecasting
bash ./scripts/long_term_forecasting/SymTime_ECL.sh

# Short-term time series forecasting
bash ./scripts/short_term_forecasting/SymTime_M4.sh

# Time series classification
bash ./scripts/classification/SymTime_EthanolConcentration.sh

# Time series imputation
bash ./scripts/imputation/SymTime_ECL.sh

# Time series anomaly detection
bash ./scripts/anomaly_detection/MSL.sh

📊 Results

Main Results

Compared with other models for general time series analysis tasks, SymTime, which has been pre-trained with mask modeling and cross-modal contrastive learning, can achieve SOTA results in fine-tuning of downstream tasks and has lower model complexity.

Benchmark Results

We present the experimental results on the TimesNet benchmark. Compared with the current more advanced models, SymTime can achieve better experimental results.

Dataset and Representation Learning

We generate a large amount of series-symbol bimodal data using S2Generator for pre-training of mask modeling and contrastive learning. Therefore, we first verify the representation coverage of the synthetic data compared with real-world time series dataset.

Then, we visualize the representation space of time series encoder (a)(b) and symbolic expression encoder (c)(d) in SymTime before and after pre-training. The paired time series and symbolic expressions form distinct clustering features, demonstrating the effectiveness of our pre-training paradigm.

🎓 Citation

If you find this code useful, please cite our paper.

@misc{wang2025syntheticseriessymboldatageneration,
      title={Synthetic Series-Symbol Data Generation for Time Series Foundation Models}, 
      author={Wenxuan Wang and Kai Wu and Yujian Betterest Li and Dan Wang and Xiaoyu Zhang},
      year={2025},
      eprint={2510.08445},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.08445}, 
}

🎖️ Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

Time-Series-Library (https://github.com/thuml/Time-Series-Library);
PySDKit (https://github.com/wwhenxuan/PySDKit);
ALBEF (https://github.com/salesforce/ALBEF);
PatchTST (https://github.com/yuqinie98/PatchTST);
Short-term Forecasting: (https://github.com/ServiceNow/N-BEATS).

🤗 Contact

If you have any questions or are interested in our view on the complex dynamics of time series, feel free to contact:

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
configs		configs
data_provider		data_provider
exp		exp
layers		layers
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anomaly_detection.py		anomaly_detection.py
classification.py		classification.py
imputation.py		imputation.py
long_term_forecast.py		long_term_forecast.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
short_term_forecast.py		short_term_forecast.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SymTime NeurIPS 2025

✨ Introduction

🧭 Quickstart

Installation

Data Preparation

Model Pre-Train and Fine-Tune

📊 Results

Main Results

Benchmark Results

Dataset and Representation Learning

🎓 Citation

🎖️ Acknowledgement

🤗 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

wwhenxuan/SymTime

Folders and files

Latest commit

History

Repository files navigation

SymTime NeurIPS 2025

✨ Introduction

🧭 Quickstart

Installation

Data Preparation

Model Pre-Train and Fine-Tune

📊 Results

Main Results

Benchmark Results

Dataset and Representation Learning

🎓 Citation

🎖️ Acknowledgement

🤗 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages