ModelScope Documentation Update - 2026-02-02 13:11 #117

New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
github-actions wants to merge 1 commit into main from update/modelscope-docs-20260202-131100
docs/flagrelease_en/model_readmes/FlagRelease_RoboBrain2.5-8B-ascend-FlagOS.md
            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,190 +1,174 @@
  
    ---

    license: apache-2.0

    ---

    <div align="center">

    <img src="https://github.com/FlagOpen/RoboBrain2.5/raw/main/assets/logo2.png" width="500"/>

    </div>

    # Introduction

    <h1 align="center">RoboBrain 2.5: Depth in Sight, Time in Mind. </h1>

    **FlagOS** is a unified heterogeneous computing software stack for large models, co-developed with leading global chip manufacturers. With core technologies such as the **FlagScale** distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the FlagOS stack to automatically produce and release various combinations of <chip + open-source model>. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application.

    <p align="center">

            </a>&nbsp&nbsp⭐️ <a href="https://superrobobrain.github.io/">Project Page</a></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/collections/BAAI/robobrain25/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://github.com/FlagOpen/RoboBrain2.5">Github</a>&nbsp&nbsp 

    </p>

    Based on this, the **RoboBrain2.5-8B-FlagOS** model is adapted for the Ascend chip using the FlagOS software stack, enabling:

    ## 🔥 Overview

    **RoboBrain-2.5** is a next-generation Embodied AI foundation model that significantly evolves its predecessor's core capabilities in general perception, spatial reasoning, and temporal modeling through extensive training on high-quality spatiotemporal data. It achieves a paradigm shift in 3D Spatial Reasoning, transitioning from 2D relative points to predicting 3D coordinates with depth information, understanding absolute metric constraints, and generating complete manipulation trajectories tailored for complex tasks with physical constraints. Furthermore, it establishes a breakthrough in Temporal Value Prediction by constructing a General Reward Modeling Method that provides dense progress tracking and multi-granular execution state estimation across varying viewpoints. This empowers VLA reinforcement learning with immediate, dense feedback signals, enabling robots to achieve high task success rates and robustness in fine-grained manipulation scenarios.

    ### Integrated Deployment

    <div align="center">

    <img src="https://github.com/FlagOpen/RoboBrain2.5/raw/main/assets/rb25_feature.png" />

    </div>

    - Deep integration with the open-source [FlagScale framework](https://github.com/FlagOpen/FlagScale)

    - Out-of-the-box inference scripts with pre-configured hardware and software parameters

    - Released **FlagOS-Ascend** container image supporting deployment within minutes

    <div align="center">

    <img src="https://github.com/FlagOpen/RoboBrain2.5/raw/main/assets/rb25_result.png" />

    </div>

    ### Consistency Validation

    ## 🚀 Key Highlights

    - Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public.

    ### 1. Comprehensive Upgrade in ✨ Native 3D Spatial Reasoning ✨ 

    Compared to version 2.0, **RoboBrain-2.5** achieves a leap in spatial perception and reasoning capabilities:

    *   **From 2D to 3D:** Upgraded from predicting coordinate points on 2D images to predicting coordinate points with depth information in **3D space** (3D Spatial Referring).

    *   **Relative to Absolute:** Evolved from understanding relative spatial relationships to measuring **absolute 3D spatial metric information** (3D Spatial Measuring). The model can comprehend precise physical constraint instructions (e.g., "hovering 1-5 cm above").

    *   **Point to Trace:** Advanced from predicting a single target point for pick-and-place to predicting a **series of key points** that describe the complete manipulation process (3D Spatial Trace), naturally possessing spatial planning capabilities with 3D absolute metrics.

    # Technical Overview

    ## **FlagScale Distributed Training and Inference Framework**

    ### 2. Breakthrough in ✨ Dense Temporal Value Estimation ✨ 

    **RoboBrain-2.5** makes significant progress in temporal modeling by constructing a General Reward Model (GRM):

    *   **Dense Progress Prediction:** Capable of multi-granularity task progress prediction across different tasks, viewpoints, and embodiments.

    *   **Execution State Estimation:** Understands task goals and estimates various states during execution (e.g., success, failure, error occurrence).

    *   **Empowering VLA Reinforcement Learning:** Provides real-time, dense feedback signals and rewards for VLA (Vision-Language-Action) reinforcement learning. With only **one demonstration**, it achieves a task success rate of **95%+** in complex, fine-grained manipulations.

    FlagScale is an end-to-end framework for large models across heterogeneous computing resources, maximizing computational efficiency and ensuring model validity through core technologies. Its key advantages include:

    ### 3. More Powerful Core Capabilities from previous version 2.0

    **RoboBrain 2.5** also maintains the three core capabilities of version 2.0, which supports ***interactive reasoning*** with long-horizon planning and closed-loop feedback, ***spatial perception*** for precise point and bbox prediction from complex instructions, ***temporal perception*** for future trajectory estimation, and ***scene reasoning*** through real-time structured memory construction and update.

    - **Unified Deployment Interface:** Standardized command-line tools support one-click service deployment across multiple hardware platforms, significantly reducing adaptation costs in heterogeneous environments.

    - **Intelligent Parallel Optimization:** Automatically generates optimal distributed parallel strategies based on chip computing characteristics, achieving dynamic load balancing of computation/communication resources.

    - **Seamless Operator Switching:** Deep integration with the FlagGems operator library allows high-performance operators to be invoked via environment variables without modifying model code.

    ## 🛠️ Setup

    ## **FlagGems Universal Large-Model Operator Library**

    ```bash

    # clone repo.

    git clone https://github.com/FlagOpen/RoboBrain2.5.git

    cd RoboBrain2.5

    # build conda env.

    conda create -n robobrain2_5 python=3.10

    conda activate robobrain2_5

    pip install -r requirements.txt

    ```

    FlagGems is a Triton-based, cross-architecture operator library collaboratively developed with industry partners. Its core strengths include:

    - **Full-stack Coverage**: Over 100 operators, with a broader range of operator types than competing libraries.

    - **Ecosystem Compatibility**: Supports 7 accelerator backends. Ongoing optimizations have significantly improved performance.

    - **High Efficiency**: Employs unique code generation and runtime optimization techniques for faster secondary development and better runtime performance compared to alternatives.

    ## 💡 Quickstart

    ## **FlagEval Evaluation Framework**

    ### 1. Usage for General VQA

    ```python

    from inference import UnifiedInference

    FlagEval (Libra)** is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features:

      - **Multi-dimensional Evaluation**: Supports 800+ model evaluations across NLP, CV, Audio, and Multimodal fields, covering 20+ downstream tasks including language understanding and image-text generation.

      - **Industry-Grade Use Cases**: Has completed horizontal evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation.

    model = UnifiedInference("BAAI/RoboBrain2.5-8B-NV")

    # Evaluation Results

    # Example:

    prompt = "What is shown in this image?"

    image = "http://images.cocodataset.org/val2017/000000039769.jpg"

    ## Benchmark Result 

    pred = model.inference(prompt, image, task="general")

    print(f"Prediction:\n{pred}")

    ```

    | Metrics           | RoboBrain2.5-8B-NV-CUDA | RoboBrain2.5-8B-ascend-FlagOS|

    |-------------------|--------------------------|-----------------------------|

    | erqa | 41.250 | 41.000 |

    | Where2Place | 1.220 | 0.450 |

    | blink_val_ev | 78.180 | 80.490 |

    | cv_bench_test | 87.490 | 87.680 |

    | embspatial_bench | 75.910 | 77.390 |

    | SAT | 69.330 | 74.000 |

    | vsi_bench_tiny | 38.340 | 35.610 |

    | robo_spatial_home_all | 49.429 | 44.286 |

    | all_angles_bench | 48.360 | 50.420 |

    | egoplan_bench2 | 49.050 | 49.430 |

    | EmbodiedVerse-Open-Sampled | 44.320 | 44.910 |

    | ERQAPlus | 21.880 | 20.880 |

    # User Guide

    **Environment Setup**

    |Accelerator Card Driver Version | Driver Version: 25.2.0       |

    | ------------------------------- | ----------------------------------- |

    | CANN                 |  8.3.0.2.220 (8.3.RC2) |

    | FlagTree                        | Version: 0.4.0                      |

    | FlagGems                        | Version: 4.2.0                      |

    | VLLM-FL                         | Version: 0.0.0                      |

    ### 2. Usage for Visual Grounding (VG)

    ```python

    from inference import UnifiedInference

    model = UnifiedInference("BAAI/RoboBrain2.5-8B-NV")

    ## Operation Steps

    # Example:

    prompt = "the person wearing a red hat"

    image = "./assets/demo/grounding.jpg"

    ### Download Open-source Model Weights

    # Visualization results will be saved to ./result, if `plot=True`.

    pred = model.inference(prompt, image, task="grounding", plot=True, do_sample=False)

    print(f"Prediction:\n{pred}")

    ```bash

    pip install modelscope

    modelscope download --model FlagRelease/RoboBrain2.5-8B-FlagOS --local_dir /data/workspace-robobrain2.5/RoboBrain2.5-8B

    ```

    ### 3. Usage for Affordance Prediction (Embodied)

    ```python

    from inference import UnifiedInference

    ### Download FlagOS Image

    ```bash

    docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-ascend-release-model_robobrain2.5-8b-tree_0.4.0_ascend3.2e-gems_4.2.0-scale_1.0.0-cx_0.8.0-python_3.11.13-torch_npu2.8.0-pcp_cann8.3.0.2.220_8.3.rc2-gpu_ascend001-arc_arm64-driver_25.2.0:latest

    model = UnifiedInference("BAAI/RoboBrain2.5-8B-NV")

    ### Start the inference service

    ```bash

    #Container Startup

    docker run -itd --name flagos -u root --privileged=true --shm-size=1000g --net=host \

        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \

        -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \

        -v /usr/local/dcmi:/usr/local/dcmi \

        -v /usr/local/sbin:/usr/local/sbin \

        -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \

        -v /etc/ascend_install.info:/etc/ascend_install.info \

        -v /data/workspace-robobrain2.5:/workspace-robobrain2.5 \

        -v /root/.cache:/root/.cache \

        -w /workspace-robobrain2.5   \

        -e VLLM_USE_V1=1 \

        -e CPU_AFFINITY_CONF=2 \

        -e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \

        -e USE_FLAGGEMS=true \

    harbor.baai.ac.cn/flagrelease-public/flagrelease-ascend-release-model_robobrain2.5-8b-tree_0.4.0_ascend3.2e-gems_4.2.0-scale_1.0.0-cx_0.8.0-python_3.11.13-torch_npu2.8.0-pcp_cann8.3.0.2.220_8.3.rc2-gpu_ascend001-arc_arm64-driver_25.2.0:latest bash

    ```

    # Example:

    prompt = "the affordance area for holding the cup"

    image = "./assets/demo/affordance.jpg"

    ### Serve

    # Visualization results will be saved to ./result, if `plot=True`.

    pred = model.inference(prompt, image, task="pointing", plot=True, do_sample=False)

    print(f"Prediction:\n{pred}")

    ```bash

    docker exec -it flagos bash

    flagscale serve rb25

    ```

    ### 4. Usage for Refering Prediction (Embodied)

    ```python

    from inference import UnifiedInference

    model = UnifiedInference("BAAI/RoboBrain2.5-8B-NV")

    ## Service Invocation

    ### API-based Invocation Script

    # Example:

    prompt = "Identify spot within the vacant space that's between the two mugs"

    image = "./assets/demo/pointing.jpg"

    ```bash

    import openai

    openai.api_key = "EMPTY"

    openai.base_url = "http://<server_ip>:9014/v1/"

    model = "RoboBrain2.5-8B-ascend-flagos"

    messages = [

        {"role": "system", "content": "You are a helpful assistant."},

        {"role": "user", "content": "What's the weather like today?"}

    ]

    response = openai.chat.completions.create(

        model=model,

        messages=messages,

        temperature=0.7,

        top_p=0.95,

        stream=False,

    )

    for item in response:

        print(item)

    # Visualization results will be saved to ./result, if `plot=True`.

    pred = model.inference(prompt, image, task="pointing", plot=True, do_sample=True)

    print(f"Prediction:\n{pred}")

    ```

    ### 5. Usage for Navigation Tasks (Embodied)

    ```python

    from inference import UnifiedInference

    ### AnythingLLM Integration Guide

    model = UnifiedInference("BAAI/RoboBrain2.5-8B-NV")

    #### 1. Download & Install

    # Example 1:

    prompt_1 = "Identify spot within toilet in the house"

    image = "./assets/demo/navigation.jpg"

    - Visit the official site: https://anythingllm.com/

    - Choose the appropriate version for your OS (Windows/macOS/Linux)

    - Follow the installation wizard to complete the setup

    # Visualization results will be saved to ./result, if `plot=True`.

    pred = model.inference(prompt_1, image, task="pointing", plot=True, do_sample=True)

    print(f"Prediction:\n{pred}")

    #### 2. Configuration

    # Example 2:

    prompt_2 = "Identify spot within the sofa in the house"

    image = "./assets/demo/navigation.jpg"

    - Launch AnythingLLM

    - Open settings (bottom left, fourth tab)

    - Configure core LLM parameters

    - Click "Save Settings" to apply changes

    # Visualization results will be saved to ./result, if `plot=True`.

    pred = model.inference(prompt_2, image, task="pointing", plot=True, do_sample=True)

    print(f"Prediction:\n{pred}")

    ```

    #### 3. Model Interaction

    ### 6. Usage for ✨ 3D Trajectory Prediction ✨ (Embodied)

    ```python

    from inference import UnifiedInference

    - After model loading is complete:

      - Click **"New Conversation"**

      - Enter your question (e.g., “Explain the basics of quantum computing”)

      - Click the send button to get a response

    model = UnifiedInference("BAAI/RoboBrain2.5-8B-NV")

    # Contributing

    # Example:

    prompt = "reach for the banana on the plate"

    image = "./assets/demo/trajectory.jpg"

    We warmly welcome global developers to join us:

    # Visualization results will be saved to ./result, if `plot=True`.

    pred = model.inference(prompt, image, task="trajectory", plot=True, do_sample=False)

    print(f"Prediction:\n{pred}")

    ```

    1. Submit Issues to report problems

    2. Create Pull Requests to contribute code

    3. Improve technical documentation

    4. Expand hardware adaptation support

    ### 7. Usage for ✨ Temporal Value Estimation ✨ (Embodied)

    ***We highly recommend referring to [Robo-Dopamine](https://github.com/FlagOpen/Robo-Dopamine) for detailed usage instructions.***

    ```bash

    # clone Robo-Dopamine repo.

    git clone https://github.com/FlagOpen/Robo-Dopamine.git

    cd Robo-Dopamine

    ```

    ```python

    import os

    from examples.inference import GRMInference

    # model = GRMInference("tanhuajie2001/Robo-Dopamine-GRM-3B")

    model = GRMInference("BAAI/RoboBrain2.5-8B-NV")

    TASK_INSTRUCTION = "organize the table"

    BASE_DEMO_PATH = "./examples/demo_table"

    GOAL_IMAGE_PATH = "./examples/demo_table/goal_image.png" 

    OUTPUT_ROOT = "./results"

    output_dir = model.run_pipeline(

        cam_high_path  = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"),

        cam_left_path  = os.path.join(BASE_DEMO_PATH, "cam_left_wrist.mp4"),

        cam_right_path = os.path.join(BASE_DEMO_PATH, "cam_right_wrist.mp4"),

        out_root       = OUTPUT_ROOT,

        task           = TASK_INSTRUCTION,

        frame_interval = 30,

        batch_size     = 1,

        goal_image     = GOAL_IMAGE_PATH,

        eval_mode      = "incremental",

        visualize      = True

    )

    print(f"Episode ({BASE_DEMO_PATH}) processed with Incremental-Mode. Output at: {output_dir}")

    # License

    ```

    本模型的权重来源于BAAI/RoboBrain2.5-8B-NV，以apache2.0协议https://www.apache.org/licenses/LICENSE-2.0.txt开源。
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelScope Documentation Update - 2026-02-02 13:11 #117

Uh oh!

Diff view

Diff view

There are no files selected for viewing