Skip to content

antgroup/SparkUI-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

The official repository of SparkUI-Parser, a novel end-to-end GUI grounding and parsing enhancement algorithm

Paper Huggingface Paper alphaXiv GitHub


SparkUI-Parser Framework

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing


Overview


πŸŽ‰ News

[2025-9-5] We release our paper: SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing. We plan to open source the training code and our proposed GUI parsing benchmark-ScreenParse with evaluation code soon.


πŸš€ Motivation

Schematic diagram of discrete to continuous coordinate modeling

Comparison of the coordinate generation between prior methods (left) and ours (right).

We utilize enhanced features instead of multiple discrete tokens to obtain continuous coordinate values, thereby improving the precision of grounding and speeding up the inference.


✨ Highlights

  • 🎯 Robust Grounding and Parsing: We are the first to introduce an end-to-end MLLM for GUI perception, which simultaneously achieves robust grounding and parsing on user interfaces, providing a comprehensive perception of semantics and structures.
  • πŸ”€ Route-then-predict Framework: By processing semantics and coordinates of the element separately, our method improves precision in grounding by around 3% averagely and speeds up grounding and parsing by 5 times and 4 times in average.
  • πŸ“Š Parsing Benchmark-ScreenParse: a benchmark for GUI parsing, which provides an evaluation for the performance of models in both locating specific elements and perceiving the overall structure of user interfaces.
  • πŸ‘‘ Excellent Grounding and Parsing Performance on various benchmarks.

πŸ“„ Citation

If you use works related to SparkUI-Parser, please cite our work:

@misc{jing2025sparkuiparserenhancingguiperception,
      title={SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing}, 
      author={Hongyi Jing and Jiafu Chen and Chen Rao and Ziqiang Dang and Jiajie Teng and Tianyi Chu and Juncheng Mo and Shuo Fang and Huaizhong Lin and Rui Lv and Chenguang Ma and Lei Zhao},
      year={2025},
      eprint={2509.04908},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.04908}, 
}

If you are interested in our method or it helps your research, please give us a star 🌟 on GitHub.

arXiv: 2509.04908

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published