Interleaved MiniMax Demo

Interactive demo showcasing MiniMax-M2's interleaved thinking for agentic workflows, demonstrating adaptive strategy after each tool call with transparent traces and cost benchmarking. The project distills what we learned while building context-aware agents for front-end teams: keep the loop observable, keep the tooling grounded, and quantify efficiency against other LLMs.

Why this exists

Explain interleaved thinking in practice. Every reasoning burst, tool call, and result is streamed to the terminal and logged to disk, so practitioners can see why MiniMax-M2 course-corrects faster than linear agents.
Demonstrate agent-native workflows. MiniMax calls bespoke tools (design tokens, component specs, pattern guidance) to build a design brief, exercising the same MCP/shell/browsers-style chains we see in production.
Benchmark against other coding LLMs. The run summary emits live token counts and the equivalent MiniMax pricing (0.3 $/MTok in, 1.2 $/MTok out) so you can compare against GLM 4.6, K2 Thinking, Claude Sonnet 4.5, etc.
Serve as a starter kit for OSS contributions. The code is intentionally small, well-documented, and easy to extend with additional tools or scenarios.

Key learnings encoded here

Interleaved > linear: forcing the model to think after every tool result drastically reduces redundant calls in long-horizon front-end tasks.
Grounded tools beat fabricated answers: all tools read from claude_minimax/examples/sample_project/, ensuring explanations are backed by source material.
Observability builds trust: color-coded CLI output + JSONL logs make it trivial to review or share how MiniMax-M2 solved a task.
Cost transparency matters: developers need concrete $/token math when deciding between Claude, MiniMax, GLM, K2, etc., so we compute it every run.

Architecture Overview

interleaved_minimax/
├── demo_runner.py   # CLI + orchestration + telemetry
├── tools.py         # Tool registry (design tokens, specs, patterns)
└── demo_logs/       # JSONL traces per run (auto-created)

Components

demo_runner.py
- Loads API keys via repository-level .env
- Defines scenarios (default: design-system brief, optional: front-end shipping plan)
- Executes the MiniMax-M2 loop with reasoning_split=True
- Streams colored output and writes logs with tool calls, reasoning, and usage stats
tools.py
- get_design_tokens(category) — pulls relevant sections from design_system.md
- get_component_spec(component) — slices component_specs.md
- get_pattern_guidance(topic) — queries code_patterns.md
- Tool registry is centralized, making it easy to add new handlers (shell, retrieval, etc.)

Interleaved workflow (context_package scenario)

System prompt instructs MiniMax-M2 to think after every tool result.
User prompt asks for a design brief with actionable sections (tokens, Button contract, composition pattern).
Model enters a loop:
- emits a reasoning block (Thought:)
- decides on a tool call (JSON args printed)
- receives real data from the tool and updates the shared transcript
Loop continues until no more tool calls are needed; final response summarizes findings.
Post-run summary captures steps, calls, tokens, estimated MiniMax cost, and comparison reminders.

This matches MiniMax’s recommended API usage (extra_body={"reasoning_split": True} and preserving tool history), so you can copy the pattern into your own apps.

Scenarios included

Scenario	Purpose	Highlights
`context_package` (default)	Design-system audit	Focuses on tokens, Button contract, composition patterns
`frontend_showcase`	Shipping a UI feature today	Emphasizes action items and how interleaved thinking shaved work

Add more by editing the SCENARIOS dict in demo_runner.py.

Tooling Surface

M2's interleaved thinking shines with tool use. The demo exposes three function tools that the model can call:

Tool	Purpose	Data Source
`get_design_tokens`	Fetch design-system tokens (colors, typography, spacing, shadows, border radius, breakpoints)	`design_system.md`
`get_component_spec`	Return specifications for UI components (Button, Card, Input, Modal, Alert)	`component_specs.md`
`get_pattern_guidance`	Look up development patterns and conventions (composition, naming, testing, etc.)	`code_patterns.md`

M2's function calling works with both OpenAI-compatible and Anthropic-compatible APIs. After each tool result, M2 explicitly thinks about what it learned and adapts its strategy—unlike linear models that plan all tools upfront.

Why this matters:

If a tool returns unexpected data (e.g., incomplete component spec), M2 adapts
No wasted tokens on unnecessary tool calls
Transparent reasoning: every decision is logged in reasoning_details
Perfect for debugging and exploration workflows

Getting Started

Using MiniMax-M2 in Cursor

This demo was built using MiniMax-M2 integrated into Cursor, the AI-powered code editor. Here's how to set it up:

Install Cursor: Download from cursor.com
Configure API:
- Open Cursor Settings → Models → API Keys
- Enable "Override OpenAI Base URL"
- Set base URL to https://api.minimax.io/v1 (or https://api.minimaxi.com/v1 for China)
- Add your MiniMax API key from platform.minimax.io
- Select "MiniMax-M2" as the model
Clear conflicts: Remove any existing OpenAI environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) to avoid conflicts

Running the Demo

Clone and install:

git clone https://github.com/muratcankoylan/MiniMax-M2-Interleaved-Thinking
cd MiniMax-M2-Interleaved-Thinking
pip install -r requirements.txt

Configure API:

cp env.example .env
# Edit .env and add your MINIMAX_API_KEY

Run:

python demo_runner.py --scenario context_package

Setup

Ensure /Users/muratcankoylan/minimax/.env (or your repo root) defines MINIMAX_API_KEY (MiniMax or compatible OpenAI key).

Install dependencies (same env used by claude_minimax):

pip install -r claude_minimax/requirements.txt
# minimally: pip install openai python-dotenv colorama

Running the demo

cd /Users/muratcankoylan/minimax/interleaved_minimax
python demo_runner.py --scenario context_package      # design-system deep dive
# or
python demo_runner.py --scenario frontend_showcase    # front-end shipping plan

During the run you’ll see:

Cyan banner summarizing MiniMax capabilities and pricing
Step-by-step reasoning (“Thought”), tool calls, and tool results (color-coded)
Final answer plus run summary (steps, tool calls, thinking bursts, token/cost math)
A reminder to benchmark the cost profile against GLM 4.6, K2 Thinking, Claude Sonnet 4.5

Every event is also captured in demo_logs/<timestamp>.jsonl for replay or visualization.

Tooling surface

Tool	Backing data	Example usage
`get_design_tokens`	`design_system.md`	`{"category": "colors"}` ⇒ primary palette + semantics
`get_component_spec`	`component_specs.md`	`{"component": "Button"}` ⇒ contract placeholder
`get_pattern_guidance`	`code_patterns.md`	`{"topic": "composition"}` ⇒ composition vs props

Because these tools read real Markdown, editing the source docs immediately alters the model’s behavior—great for experiments or benchmarking different knowledge bases.

Observability & Testing

CLI stream: copy/paste-friendly output for threads, videos, or docs.
JSONL logs: each object records {step, thinking, tool_calls, tool_results, completion, usage}—ideal for building dashboards or diffing different models.
Regression check: python demo_runner.py --scenario context_package should finish in ~8 steps with ~7 tool calls. Compare token/cost results before/after modifications.

Extending & Comparing

More tools: plug shell commands, browser actions, or retrieval APIs into TOOL_REGISTRY.
Different models: swap MINIMAX_MODEL in .env to run the exact workflow on GLM/K2/Claude and compare logs.
Visualization: feed the JSONL trace into your favorite tooling (DAG viewer, metrics dashboards, etc.).
Contribution ideas: add test harnesses, integrate MCP servers, build a Streamlit or browser UI, or script head-to-head benchmarks.
Add new scenarios to SCENARIOS in demo_runner.py.
Register additional tools in tools.py and expose them through build_tools_spec.
Point the tool implementations at different documents if you want to showcase other workflows (API docs, test logs, etc.).

This minimal surface keeps the interleaved loop visible while still exercising multiple tool calls grounded in the project’s own artifacts. Use it for demos or regression tests when you update your toolset.

MiniMax-M2’s interleaved thinking thrives when we can show every decision, every tool call, and every dollar saved. This project keeps that loop visible—ready for open-source contributions, demos, and competitive benchmarks. Have fun building. 🙌

Contributing

Contributions are welcome. Open an issue or PR if you want to add scenarios, tools, or visualizations. Please follow the existing code style and include tests where relevant.

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
demo_logs		demo_logs
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
INTERLEAVED_THINKING_DEMO.md		INTERLEAVED_THINKING_DEMO.md
LICENSE		LICENSE
README.md		README.md
demo_runner.py		demo_runner.py
env.example		env.example
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interleaved MiniMax Demo

Why this exists

Key learnings encoded here

Architecture Overview

Components

Interleaved workflow (context_package scenario)

Scenarios included

Tooling Surface

Getting Started

Using MiniMax-M2 in Cursor

Running the Demo

Setup

Running the demo

Tooling surface

Observability & Testing

Extending & Comparing

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

muratcankoylan/MiniMax-M2-Interleaved-Thinking

Folders and files

Latest commit

History

Repository files navigation

Interleaved MiniMax Demo

Why this exists

Key learnings encoded here

Architecture Overview

Components

Interleaved workflow (context_package scenario)

Scenarios included

Tooling Surface

Getting Started

Using MiniMax-M2 in Cursor

Running the Demo

Setup

Running the demo

Tooling surface

Observability & Testing

Extending & Comparing

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages