English | 简体中文
Are VLMs Ready for Autonomous Driving?
An Empirical Study from the Reliability, Data, and Metric Perspectives
Shaoyuan Xie1
Lingdong Kong2,3
Yuhao Dong2,4
Chonghao Sima2,5
Wenwei Zhang2
Qi Alfred Chen1
Ziwei Liu4
Liang Pan2
1UC Irvine
2Shanghai AI Laboratory
3NUS
4NTU
5HKU
![]() |
|---|
- This work introduces 🚙 DriveBench, a benchmark dataset designed to evaluate VLM reliability across 17 settings (clean, corrupted, and text-only inputs), encompassing 19,200 frames, 20,498 question-answer pairs, three question types, four mainstream driving tasks, and a total of 12 popular VLMs.
- Our findings reveal that VLMs often generate plausible responses derived from general knowledge or textual cues rather than true visual grounding, especially under degraded or missing visual inputs. This behavior, concealed by dataset imbalances and insufficient evaluation metrics, poses significant risks in safety-critical scenarios like autonomous driving.
If you find this work helpful for your research, please kindly consider citing our papers:
@inproceedings{xie2025drivebench,
title = {Are {VLMs} Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives},
author = {Xie, Shaoyuan and Kong, Lingdong and Dong, Yuhao and Sima, Chonghao and Zhang, Wenwei and Chen, Qi Alfred and Liu, Ziwei and Pan, Liang},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages = {6585-6597},
month = {October},
year = {2025}
}@misc{robosense_challenge_2025,
title = {The {RoboSense} Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms},
author = {Kong, Lingdong and Xie, Shaoyuan and Gong, Zeying and Li, Ye and Chu, Meng and Liang, Ao and Dong, Yuhao and Hu, Tianshuai and Qiu, Ronghe and Li, Rong and Hu, Hanjiang and Lu, Dongyue and Yin, Wei and Ding, Wenhao and Li, Linfeng and Song, Hang and Zhang, Wenwei and Ma, Yuexin and Liang, Junwei and Zheng, Zhedong and Ng, Lai Xing and Cottereau, Benoit R. and Ooi, Wei Tsang and Liu, Ziwei and Zhang, Zhanpeng and Qiu, Weichao and Zhang, Wei and Ao, Ji and Zheng, Jiangpeng and Wang, Siyu and Yang, Guang and Zhang, Zihao and Zhong, Yu and Gao, Enzhu and Zheng, Xinhan and Wang, Xueting and Li, Shouming and Gao, Yunkai and Lan, Siming and Han, Mingfei and Hu, Xing and Malic, Dusan and Fruhwirth-Reisinger, Christian and Prutsch, Alexander and Lin, Wei and Schulter, Samuel and Possegger, Horst and Li, Linfeng and Zhao, Jian and Yang, Zepeng and Song, Yuhang and Lin, Bojun and Zhang, Tianle and Yuan, Yuchen and Zhang, Chi and Li, Xuelong and Kim, Youngseok and Hwang, Sihwan and Jeong, Hyeonjun and Wu, Aodi and Luo, Xubo and Xiao, Erjia and Zhang, Lingfeng and Tang, Yingbo and Cheng, Hao and Xu, Renjing and Ding, Wenbo and Zhou, Lei and Chen, Long and Ye, Hangjun and Hao, Xiaoshuai and Li, Shuangzhi and Shen, Junlong and Li, Xingyu and Ruan, Hao and Lin, Jinliang and Luo, Zhiming and Zang, Yu and Wang, Cheng and Wang, Hanshi and Gong, Xijie and Yang, Yixiang and Ma, Qianli and Zhang, Zhipeng and Shi, Wenxiang and Zhou, Jingmeng and Zeng, Weijun and Xu, Kexin and Zhang, Yuchen and Fu, Haoxiang and Hu, Ruibin and Ma, Yanbiao and Feng, Xiyan and Zhang, Wenbo and Zhang, Lu and Zhuge, Yunzhi and Lu, Huchuan and He, You and Yu, Seungjun and Park, Junsung and Lim, Youngsun and Shim, Hyunjung and Liang, Faduo and Wang, Zihang and Peng, Yiming and Zong, Guanyu and Li, Xu and Wang, Binghao and Wei, Hao and Ma, Yongxin and Shi, Yunke and Liu, Shuaipeng and Kong, Dong and Lin, Yongchun and Yang, Huitong and Lei, Liang and Li, Haoang and Zhang, Xinliang and Wang, Zhiyong and Wang, Xiaofeng and Fu, Yuxia and Luo, Yadan and Etchegaray, Djamahl and Li, Yang and Li, Congfei and Sun, Yuxiang and Zhu, Wenkai and Xu, Wang and Li, Linru and Liao, Longjie and Yan, Jun and Wang, Benwu and Ren, Xueliang and Yue, Xiaoyu and Zheng, Jixian and Wu, Jinfeng and Qin, Shurui and Cong, Wei and He, Yao},
howpublished = {\url{https://robosense2025.github.io}},
year = {2025}
}- [2025.07] - The DriveBench dataset has been extended to Track 1: Driving with Language of the RoboSense Challenge at IROS 2025. See the track homepage and GitHub repo for more details.
- [2025.06] - Our paper has been accepted to ICCV 2025. See you in Honolulu! 🌸
- [2025.04] - We are hosting the 2025 RoboSense Challenge! Visit the competition homepage for details and participation. 🏁
- [2025.01] - The evaluation data can be accessed at our HuggingFace Dataset Card. 🤗
- [2025.01] - Introducing the 🚙 DriveBench project! For more details, kindly refer to our Project Page and Preprint. 🚀
- Benchmark Comparison
- Installation
- Data Preparation
- Getting Started
- Benchmark Results
- License
- Acknowledgments
For details related to installation and environment setups, kindly refer to INSTALL.md.
Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.
To learn more usage about this codebase, kindly refer to GET_STARTED.md.
Commercial VLMs
Open-Source VLMs
Specialist VLMs
This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses. Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
To be updated.


