The official repository of SparkUI-Parser, a novel end-to-end GUI grounding and parsing enhancement algorithm
[2025-9-5] We release our paper: SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing. We plan to open source the training code and our proposed GUI parsing benchmark-ScreenParse with evaluation code soon.
We utilize enhanced features instead of multiple discrete tokens to obtain continuous coordinate values, thereby improving the precision of grounding and speeding up the inference.
- π― Robust Grounding and Parsing: We are the first to introduce an end-to-end MLLM for GUI perception, which simultaneously achieves robust grounding and parsing on user interfaces, providing a comprehensive perception of semantics and structures.
- π Route-then-predict Framework: By processing semantics and coordinates of the element separately, our method improves precision in grounding by around 3% averagely and speeds up grounding and parsing by 5 times and 4 times in average.
- π Parsing Benchmark-ScreenParse: a benchmark for GUI parsing, which provides an evaluation for the performance of models in both locating specific elements and perceiving the overall structure of user interfaces.
- π Excellent Grounding and Parsing Performance on various benchmarks.
If you use works related to SparkUI-Parser, please cite our work:
@misc{jing2025sparkuiparserenhancingguiperception,
title={SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing},
author={Hongyi Jing and Jiafu Chen and Chen Rao and Ziqiang Dang and Jiajie Teng and Tianyi Chu and Juncheng Mo and Shuo Fang and Huaizhong Lin and Rui Lv and Chenguang Ma and Lei Zhao},
year={2025},
eprint={2509.04908},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.04908},
}If you are interested in our method or it helps your research, please give us a star π on GitHub.

