SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

The official repository of SparkUI-Parser, a novel end-to-end GUI grounding and parsing enhancement algorithm

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

Overview

News
Motivation
Highlights
Citation

🎉 News

[2025-9-5] We release our paper: SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing. We plan to open source the training code and our proposed GUI parsing benchmark-ScreenParse with evaluation code soon.

🚀 Motivation

Schematic diagram of discrete to continuous coordinate modeling

Comparison of the coordinate generation between prior methods (left) and ours (right).

We utilize enhanced features instead of multiple discrete tokens to obtain continuous coordinate values, thereby improving the precision of grounding and speeding up the inference.

✨ Highlights

🎯 Robust Grounding and Parsing: We are the first to introduce an end-to-end MLLM for GUI perception, which simultaneously achieves robust grounding and parsing on user interfaces, providing a comprehensive perception of semantics and structures.
🔀 Route-then-predict Framework: By processing semantics and coordinates of the element separately, our method improves precision in grounding by around 3% averagely and speeds up grounding and parsing by 5 times and 4 times in average.
📊 Parsing Benchmark-ScreenParse: a benchmark for GUI parsing, which provides an evaluation for the performance of models in both locating specific elements and perceiving the overall structure of user interfaces.
👑 Excellent Grounding and Parsing Performance on various benchmarks.

📄 Citation

If you use works related to SparkUI-Parser, please cite our work:

@misc{jing2025sparkuiparserenhancingguiperception,
      title={SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing}, 
      author={Hongyi Jing and Jiafu Chen and Chen Rao and Ziqiang Dang and Jiajie Teng and Tianyi Chu and Juncheng Mo and Shuo Fang and Huaizhong Lin and Rui Lv and Chenguang Ma and Lei Zhao},
      year={2025},
      eprint={2509.04908},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.04908}, 
}

If you are interested in our method or it helps your research, please give us a star 🌟 on GitHub.

arXiv: 2509.04908

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
.DS_Store		.DS_Store
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

Overview

🎉 News

🚀 Motivation

✨ Highlights

📄 Citation

About

Uh oh!

Releases

Packages

License

antgroup/SparkUI-Parser

Folders and files

Latest commit

History

Repository files navigation

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

Overview

🎉 News

🚀 Motivation

✨ Highlights

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages