Code Mode
Instead of giving LLMs dozens of MCP tools they fumble with, give them one tool — execute_code — and let them write TypeScript. The results are dramatic: fewer tokens, fewer round-trips, fewer mistakes.
Originated at Cloudflare by Kenton Varda & Sunil Pai · Blog post (Sep 2025) · Docs
🔴 The problem with native MCP tool calling
Traditional MCP exposes every endpoint as a tool:
- A GitHub MCP server exposes 50+ tools (list issues, create PR, search code…)
- Add Stripe, Slack, and your internal API — now you're at 200+ tools
- Every tool's schema (name, description, parameters) is injected into the LLM's context window on every request
- The LLM was fine-tuned on synthetic tool-call examples — not real-world usage
The core insight:
LLMs have seen millions of lines of real-world TypeScript in their training data. They've seen a few hundred contrived tool-call examples. Making an LLM call tools directly is like making Shakespeare write a play in a language he learned last month.
🟠 How Code Mode works
codemode({code})
(Worker isolate)
(typed globals)
Servers
The sandbox has no network access. Every effect goes through typed connectors.
Step 1 — MCP schema → TypeScript API
Cloudflare's Agents SDK reads each MCP server's tool schemas and generates typed TypeScript interfaces. The LLM sees code, not tool definitions.
Step 2 — One tool, not dozens
The agent exposes exactly one tool to the LLM: codemode({code: string}). The LLM writes TypeScript, and the runtime executes it in a sandboxed V8 isolate.
Step 3 — Sandboxed execution
Code runs in a fresh Cloudflare Worker isolate (not a container). Starts in milliseconds, costs almost nothing, and is thrown away after each run. The isolate has zero internet access — it can only reach MCP servers through typed bindings.
Step 4 — Results flow back
The sandboxed code uses console.log() to return results. Only the final output enters the LLM's context — not the intermediate API responses.
🟢 The token difference is absurd
| Approach | Tools | Token cost | % of 200K window |
|---|---|---|---|
| Raw OpenAPI spec in prompt | — | ~2,000,000 | 977% |
| Native MCP (full schemas) | 2,594 | 1,170,523 | 585% |
| Native MCP (minimal) | 2,594 | 244,047 | 122% |
| Code Mode | 3 | ~1,100 | 0.5% |
Source: Cloudflare MCP Server README. 2,500 Cloudflare API endpoints in ~1k tokens vs 244k for native MCP.
Why this matters beyond tokens
Multi-step operations in one sandbox run
With native tool calling, filtering 100 PRs takes 100 round-trips through the LLM. With Code Mode, the LLM writes a for loop that runs in one sandbox execution.
Discovery stays in the sandbox
codemode.search("pull request") and codemode.describe() return results into running code, not the context window. The model pulls only what it needs.
Durable state & human approval
Operations like creating issues or merging PRs need an audit trail and human sign-off. The runtime handles pending/approve/reject/rollback — the model's code just pauses at an approval gate and resumes when approved.
No API keys leak
Bindings provide pre-authorized client interfaces. The sandbox never sees raw API keys. All calls go through the agent supervisor which attaches credentials server-side.
🧬 Anatomy of a Code Mode call
The LLM writes code against typed globals. This example searches Cloudflare's API docs, finds the Workers endpoint, and creates a worker — all in one script:
// Search the spec
const matches = await codemode.search("upload worker script");
const details = await codemode.describe(matches.results[0].path);
// Call the API
const result = await cloudflare.request({
method: "PUT",
path: "/accounts/{account_id}/workers/scripts/{script_name}",
pathParams: { account_id: "abc123", script_name: "hello" },
body: { /* worker code */ }
});
console.log(result);
The LLM's context only receives the final console.log output — not the raw API responses or intermediate data.
📦 Implementations
@cloudflare/codemode
Cloudflare's SDK. Uses the Dynamic Worker Loader API to run code in V8 isolates. Built into the Cloudflare Agents platform.
npm install @cloudflare/codemode
Cloudflare API MCP Server
The entire Cloudflare API (2,500+ endpoints) as an MCP server using Code Mode. Connect at https://mcp.cloudflare.com/mcp. Three tools: search, execute, docs.
codemode-mcp
A local implementation by jx-codes (87 pts on HN). Runs MCP-to-TypeScript conversion locally with Deno sandboxing. Since superseded by lootbox — a more polished take on the same idea.
Local Code Mode
A gist by chenhunghan describing how to add Code Mode to any MCP server. Uses quickjs-emscripten (JS), RestrictedPython, goja (Go), or boa_engine (Rust) as sandbox runtimes. No external dependencies.
⚡ When to use Code Mode
✅ Good fits
- APIs with many endpoints (Cloudflare, GitHub, Stripe)
- Multi-step workflows (search → read → transform → write)
- LLMs that already know the API's schema from training data
- When context window budget matters
- When you need durable approval gates
⚠️ Not ideal
- Simple APIs with 2-3 tools
- When the LLM doesn't know the API schema
- When you need real-time streaming from the tool
- When running untrusted code is not an option
- Non-JS/TS environments without a sandbox
🚀 Quick start (Cloudflare Agents)
1. Install
npm install @cloudflare/codemode
2. Wrap your tools with codemode
import { codemode } from "@cloudflare/codemode";
const { system, tools } = codemode({
system: "You are a helpful assistant",
tools: { /* your MCP servers */ },
});
3. Use in your agent
const stream = streamText({
model: openai("gpt-5"),
system,
tools, // ← just one tool: codemode({code})
messages: [{
role: "user",
content: "Deploy a new Worker named hello-world"
}]
});
🏗️ The pieces
| Piece | What it is | State |
|---|---|---|
| Executor | Runs a block of code in an isolated sandbox. On Workers: DynamicWorkerExecutor. In browser: IframeSandboxExecutor. |
Stateless |
| Connectors | Classes bridging external services (MCP, OpenAPI, custom) into the sandbox as typed globals. Own their credentials and connection lifecycle. | Own state |
| Runtime | The handle you hold: runtime.tool() for the model, pending/approve/reject/rollback for your app, and a durable log. |
Durable |
| Worker Loader API | New Cloudflare API that loads Worker code on-demand, in milliseconds, with per-isolate bindings. No containers, no prewarming. | Stateless |
🔒 Security model
No network access. The sandbox can't make HTTP requests. fetch() throws. Every external call goes through a connector.
No API key exposure. Bindings provide pre-authenticated interfaces. The AI never sees raw credentials.
Fresh isolate per run. Millisecond startup, zero reuse. No state leaks between executions.
Approval gates. Destructive operations (create issue, merge PR) require human approval. The sandbox pauses mid-execution.
💡 What this changes
Code Mode challenges a core assumption of the MCP ecosystem: that tools must be exposed directly to the LLM. It argues that the right abstraction for LLM tool use is code, not remote procedure calls. This shifts MCP server design from "make simple tools LLMs can handle" to "expose full APIs that LLMs can reason about in code."
It also changes the economics. A 2,500-endpoint API that would consume your entire context window as native MCP tools now costs ~0.5% of that window. You can connect more services, to more capable agents, without hitting token limits.