Interactive demo showcasing MiniMax-M2's interleaved thinking for agentic workflows, demonstrating adaptive strategy after each tool call with transparent traces and cost benchmarking. The project distills what we learned while building context-aware agents for front-end teams: keep the loop observable, keep the tooling grounded, and quantify efficiency against other LLMs.
- Explain interleaved thinking in practice. Every reasoning burst, tool call, and result is streamed to the terminal and logged to disk, so practitioners can see why MiniMax-M2 course-corrects faster than linear agents.
- Demonstrate agent-native workflows. MiniMax calls bespoke tools (design tokens, component specs, pattern guidance) to build a design brief, exercising the same MCP/shell/browsers-style chains we see in production.
-
Benchmark against other coding LLMs. The run summary emits live token counts and the equivalent MiniMax pricing (0.3
$/MTok in, 1.2 $ /MTok out) so you can compare against GLM 4.6, K2 Thinking, Claude Sonnet 4.5, etc. - Serve as a starter kit for OSS contributions. The code is intentionally small, well-documented, and easy to extend with additional tools or scenarios.
- Interleaved > linear: forcing the model to think after every tool result drastically reduces redundant calls in long-horizon front-end tasks.
- Grounded tools beat fabricated answers: all tools read from
claude_minimax/examples/sample_project/, ensuring explanations are backed by source material. - Observability builds trust: color-coded CLI output + JSONL logs make it trivial to review or share how MiniMax-M2 solved a task.
- Cost transparency matters: developers need concrete $/token math when deciding between Claude, MiniMax, GLM, K2, etc., so we compute it every run.
interleaved_minimax/
├── demo_runner.py # CLI + orchestration + telemetry
├── tools.py # Tool registry (design tokens, specs, patterns)
└── demo_logs/ # JSONL traces per run (auto-created)
-
demo_runner.py- Loads API keys via repository-level
.env - Defines scenarios (default: design-system brief, optional: front-end shipping plan)
- Executes the MiniMax-M2 loop with
reasoning_split=True - Streams colored output and writes logs with tool calls, reasoning, and usage stats
- Loads API keys via repository-level
-
tools.pyget_design_tokens(category)— pulls relevant sections fromdesign_system.mdget_component_spec(component)— slicescomponent_specs.mdget_pattern_guidance(topic)— queriescode_patterns.md- Tool registry is centralized, making it easy to add new handlers (shell, retrieval, etc.)
- System prompt instructs MiniMax-M2 to think after every tool result.
- User prompt asks for a design brief with actionable sections (tokens, Button contract, composition pattern).
- Model enters a loop:
- emits a reasoning block (
Thought:) - decides on a tool call (JSON args printed)
- receives real data from the tool and updates the shared transcript
- emits a reasoning block (
- Loop continues until no more tool calls are needed; final response summarizes findings.
- Post-run summary captures steps, calls, tokens, estimated MiniMax cost, and comparison reminders.
This matches MiniMax’s recommended API usage (extra_body={"reasoning_split": True} and preserving tool history), so you can copy the pattern into your own apps.
| Scenario | Purpose | Highlights |
|---|---|---|
context_package (default) |
Design-system audit | Focuses on tokens, Button contract, composition patterns |
frontend_showcase |
Shipping a UI feature today | Emphasizes action items and how interleaved thinking shaved work |
Add more by editing the SCENARIOS dict in demo_runner.py.
M2's interleaved thinking shines with tool use. The demo exposes three function tools that the model can call:
| Tool | Purpose | Data Source |
|---|---|---|
get_design_tokens |
Fetch design-system tokens (colors, typography, spacing, shadows, border radius, breakpoints) | design_system.md |
get_component_spec |
Return specifications for UI components (Button, Card, Input, Modal, Alert) | component_specs.md |
get_pattern_guidance |
Look up development patterns and conventions (composition, naming, testing, etc.) | code_patterns.md |
M2's function calling works with both OpenAI-compatible and Anthropic-compatible APIs. After each tool result, M2 explicitly thinks about what it learned and adapts its strategy—unlike linear models that plan all tools upfront.
Why this matters:
- If a tool returns unexpected data (e.g., incomplete component spec), M2 adapts
- No wasted tokens on unnecessary tool calls
- Transparent reasoning: every decision is logged in
reasoning_details - Perfect for debugging and exploration workflows
This demo was built using MiniMax-M2 integrated into Cursor, the AI-powered code editor. Here's how to set it up:
- Install Cursor: Download from cursor.com
- Configure API:
- Open Cursor Settings → Models → API Keys
- Enable "Override OpenAI Base URL"
- Set base URL to
https://api.minimax.io/v1(orhttps://api.minimaxi.com/v1for China) - Add your MiniMax API key from platform.minimax.io
- Select "MiniMax-M2" as the model
- Clear conflicts: Remove any existing OpenAI environment variables (
OPENAI_API_KEY,OPENAI_BASE_URL) to avoid conflicts
- Clone and install:
git clone https://github.com/muratcankoylan/MiniMax-M2-Interleaved-Thinking
cd MiniMax-M2-Interleaved-Thinking
pip install -r requirements.txt- Configure API:
cp env.example .env
# Edit .env and add your MINIMAX_API_KEY- Run:
python demo_runner.py --scenario context_package- Ensure
/Users/muratcankoylan/minimax/.env(or your repo root) definesMINIMAX_API_KEY(MiniMax or compatible OpenAI key). - Install dependencies (same env used by
claude_minimax):pip install -r claude_minimax/requirements.txt # minimally: pip install openai python-dotenv colorama
cd /Users/muratcankoylan/minimax/interleaved_minimax
python demo_runner.py --scenario context_package # design-system deep dive
# or
python demo_runner.py --scenario frontend_showcase # front-end shipping planDuring the run you’ll see:
- Cyan banner summarizing MiniMax capabilities and pricing
- Step-by-step reasoning (“Thought”), tool calls, and tool results (color-coded)
- Final answer plus run summary (steps, tool calls, thinking bursts, token/cost math)
- A reminder to benchmark the cost profile against GLM 4.6, K2 Thinking, Claude Sonnet 4.5
Every event is also captured in demo_logs/<timestamp>.jsonl for replay or visualization.
| Tool | Backing data | Example usage |
|---|---|---|
get_design_tokens |
design_system.md |
{"category": "colors"} ⇒ primary palette + semantics |
get_component_spec |
component_specs.md |
{"component": "Button"} ⇒ contract placeholder |
get_pattern_guidance |
code_patterns.md |
{"topic": "composition"} ⇒ composition vs props |
Because these tools read real Markdown, editing the source docs immediately alters the model’s behavior—great for experiments or benchmarking different knowledge bases.
- CLI stream: copy/paste-friendly output for threads, videos, or docs.
- JSONL logs: each object records
{step, thinking, tool_calls, tool_results, completion, usage}—ideal for building dashboards or diffing different models. - Regression check:
python demo_runner.py --scenario context_packageshould finish in ~8 steps with ~7 tool calls. Compare token/cost results before/after modifications.
- More tools: plug shell commands, browser actions, or retrieval APIs into
TOOL_REGISTRY. - Different models: swap
MINIMAX_MODELin.envto run the exact workflow on GLM/K2/Claude and compare logs. - Visualization: feed the JSONL trace into your favorite tooling (DAG viewer, metrics dashboards, etc.).
- Contribution ideas: add test harnesses, integrate MCP servers, build a Streamlit or browser UI, or script head-to-head benchmarks.
- Add new scenarios to
SCENARIOSindemo_runner.py. - Register additional tools in
tools.pyand expose them throughbuild_tools_spec. - Point the tool implementations at different documents if you want to showcase other workflows (API docs, test logs, etc.).
This minimal surface keeps the interleaved loop visible while still exercising multiple tool calls grounded in the project’s own artifacts. Use it for demos or regression tests when you update your toolset.
MiniMax-M2’s interleaved thinking thrives when we can show every decision, every tool call, and every dollar saved. This project keeps that loop visible—ready for open-source contributions, demos, and competitive benchmarks. Have fun building. 🙌
Contributions are welcome. Open an issue or PR if you want to add scenarios, tools, or visualizations. Please follow the existing code style and include tests where relevant.
MIT License - see LICENSE for details.