Factorized Learning for Temporally Grounded Video-Language Models

Wenzheng Zeng¹, Difei Gao¹, Mike Zheng Shou¹, Hwee Tou Ng¹,

¹National University of Singapore

ICCV 2025

📄 Paper | 📄 Supp. | 🤗 Model | 🤗 Dataset | 🖼️ Poster | ▶️ Video |

This repository contains the official implementation of the ICCV 2025 paper "Factorized Learning for Temporally Grounded Video-Language Models".

🔆 Highlights

Model: We propose a new framework $D^2\mathrm{VLM}$, where we decompose the generation objective into a "grounding then answering with evidence referencing" paradigm and introduce evidence tokens to emphasize explicit event-level visual semantic capture.
Training Algorithm: We introduce Factorized Preference Optimization (FPO) that explicitly addresses both temporal grounding and textual response. A factorized data synthesis approach is also designed to support FPO.
Performance: Our method consistently outperforms SOTA methods across various tasks.
Open Source: We release the source code and model weights to the community.

🔥News

[2025-10] Code and model weights are released!
[2025-06] Our work is accepted to ICCV 2025!

🛠️ Installation

Please refer to the following environmental settings that we use. You may install these packages by yourself if you meet any problem during automatic installation.

CUDA 11.8
Python 3.12.2
PyTorch 2.4.0
Transformers 4.44.2
DeepSpeed 0.14.5
NNCore 0.4.5

Install from source

Clone the repository from GitHub.

git clone https://github.com/nusnlp/d2vlm.git
cd d2vlm

Initialize conda environment.

conda create -n d2vlm python=3.12 -y
conda activate d2vlm

Install dependencies.

pip install -r requirements.txt

📦 Dataset

Please refer to the Dataset page.

🤖 Inference and Evaluation

You can download the pre-trained model at here.
Run the following commands. Remember to change the absolute path within each .sh file.

E.T. Bench

  bash scripts/inference.sh

  # or refer to the inference and Evaluation part of scripts/train_inference_eval.sh

Charades-STA

  bash all_benchmark_eval/charades/inference.sh

all_benchmark_eval/charades/inference.sh

Youcook2

  bash all_benchmark_eval/youcook2/inference.sh

💪 Training

Download the pretrained model from here (stage-2 model of E.T. Chat).
Check and run the following command (modify relevant path).

  bash scripts/train_inference_eval.sh

🎓 Citation

If you find our work useful in your research, please consider to cite our paper:

@inproceedings{d2vlm,
  title={Factorized Learning for Temporally Grounded Video-Language Models},
  author={Zeng, Wenzheng and Gao, Difei and Shou, Mike Zheng and Ng, Hwee Tou},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025},
  pages={20683-20693}
}

🙏 Acknowledgments

This project was built upon E.T. Bench, TimeChat, and AMP. We thank their solid contribution to the community!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
all_benchmark_eval		all_benchmark_eval
d2vlm		d2vlm
fpo_anno_gen		fpo_anno_gen
other_benchmark_organize		other_benchmark_organize
pictures		pictures
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Factorized Learning for Temporally Grounded Video-Language Models

ICCV 2025

📄 Paper | 📄 Supp. | 🤗 Model | 🤗 Dataset | 🖼️ Poster | ▶️ Video |

🔆 Highlights

🔥News

🛠️ Installation

Install from source

📦 Dataset

🤖 Inference and Evaluation

E.T. Bench

Charades-STA

Youcook2

💪 Training

🎓 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

nusnlp/d2vlm

Folders and files

Latest commit

History

Repository files navigation

Factorized Learning for Temporally Grounded Video-Language Models

ICCV 2025

📄 Paper | 📄 Supp. | 🤗 Model | 🤗 Dataset | 🖼️ Poster | ▶️ Video |

🔆 Highlights

🔥News

🛠️ Installation

Install from source

📦 Dataset

🤖 Inference and Evaluation

E.T. Bench

Charades-STA

Youcook2

💪 Training

🎓 Citation

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages