Code Mode — Cloudflare's MCP pattern for LLM tool calling

🔴 The problem with native MCP tool calling

Traditional MCP exposes every endpoint as a tool:

A GitHub MCP server exposes 50+ tools (list issues, create PR, search code…)
Add Stripe, Slack, and your internal API — now you're at 200+ tools
Every tool's schema (name, description, parameters) is injected into the LLM's context window on every request
The LLM was fine-tuned on synthetic tool-call examples — not real-world usage

The core insight:

LLMs have seen millions of lines of real-world TypeScript in their training data. They've seen a few hundred contrived tool-call examples. Making an LLM call tools directly is like making Shakespeare write a play in a language he learned last month.

🟠 How Code Mode works

LLM

→

1 tool:
codemode({code})

→

Sandbox
(Worker isolate)

→

TypeScript API
(typed globals)

→

MCP
Servers

The sandbox has no network access. Every effect goes through typed connectors.

Step 1 — MCP schema → TypeScript API

Cloudflare's Agents SDK reads each MCP server's tool schemas and generates typed TypeScript interfaces. The LLM sees code, not tool definitions.

Step 2 — One tool, not dozens

The agent exposes exactly one tool to the LLM: codemode({code: string}). The LLM writes TypeScript, and the runtime executes it in a sandboxed V8 isolate.

Step 3 — Sandboxed execution

Code runs in a fresh Cloudflare Worker isolate (not a container). Starts in milliseconds, costs almost nothing, and is thrown away after each run. The isolate has zero internet access — it can only reach MCP servers through typed bindings.

Step 4 — Results flow back

The sandboxed code uses console.log() to return results. Only the final output enters the LLM's context — not the intermediate API responses.

🟢 The token difference is absurd

Approach	Tools	Token cost	% of 200K window
Raw OpenAPI spec in prompt	—	~2,000,000	977%
Native MCP (full schemas)	2,594	1,170,523	585%
Native MCP (minimal)	2,594	244,047	122%
Code Mode	3	~1,100	0.5%

Source: Cloudflare MCP Server README. 2,500 Cloudflare API endpoints in ~1k tokens vs 244k for native MCP.

Why this matters beyond tokens

Multi-step operations in one sandbox run

With native tool calling, filtering 100 PRs takes 100 round-trips through the LLM. With Code Mode, the LLM writes a for loop that runs in one sandbox execution.

Discovery stays in the sandbox

codemode.search("pull request") and codemode.describe() return results into running code, not the context window. The model pulls only what it needs.

Durable state & human approval

Operations like creating issues or merging PRs need an audit trail and human sign-off. The runtime handles pending/approve/reject/rollback — the model's code just pauses at an approval gate and resumes when approved.

No API keys leak

Bindings provide pre-authorized client interfaces. The sandbox never sees raw API keys. All calls go through the agent supervisor which attaches credentials server-side.

🧬 Anatomy of a Code Mode call

The LLM writes code against typed globals. This example searches Cloudflare's API docs, finds the Workers endpoint, and creates a worker — all in one script:

// Search the spec
const matches = await codemode.search("upload worker script");
const details = await codemode.describe(matches.results[0].path);

// Call the API
const result = await cloudflare.request({
  method: "PUT",
  path: "/accounts/{account_id}/workers/scripts/{script_name}",
  pathParams: { account_id: "abc123", script_name: "hello" },
  body: { /* worker code */ }
});

console.log(result);

The LLM's context only receives the final console.log output — not the raw API responses or intermediate data.

📦 Implementations

Official

@cloudflare/codemode

Cloudflare's SDK. Uses the Dynamic Worker Loader API to run code in V8 isolates. Built into the Cloudflare Agents platform.

npm install @cloudflare/codemode

Docs npm

Official MCP Server

Cloudflare API MCP Server

The entire Cloudflare API (2,500+ endpoints) as an MCP server using Code Mode. Connect at https://mcp.cloudflare.com/mcp. Three tools: search, execute, docs.

GitHub

Community

codemode-mcp

A local implementation by jx-codes (87 pts on HN). Runs MCP-to-TypeScript conversion locally with Deno sandboxing. Since superseded by lootbox — a more polished take on the same idea.

GitHub lootbox (successor)

Pattern

Local Code Mode

A gist by chenhunghan describing how to add Code Mode to any MCP server. Uses quickjs-emscripten (JS), RestrictedPython, goja (Go), or boa_engine (Rust) as sandbox runtimes. No external dependencies.

Gist

⚡ When to use Code Mode

✅ Good fits

APIs with many endpoints (Cloudflare, GitHub, Stripe)
Multi-step workflows (search → read → transform → write)
LLMs that already know the API's schema from training data
When context window budget matters
When you need durable approval gates

⚠️ Not ideal

Simple APIs with 2-3 tools
When the LLM doesn't know the API schema
When you need real-time streaming from the tool
When running untrusted code is not an option
Non-JS/TS environments without a sandbox

🚀 Quick start (Cloudflare Agents)

1. Install

npm install @cloudflare/codemode

2. Wrap your tools with codemode

import { codemode } from "@cloudflare/codemode";

const { system, tools } = codemode({
  system: "You are a helpful assistant",
  tools: { /* your MCP servers */ },
});

3. Use in your agent

const stream = streamText({
  model: openai("gpt-5"),
  system,
  tools,  // ← just one tool: codemode({code})
  messages: [{
    role: "user",
    content: "Deploy a new Worker named hello-world"
  }]
});

🏗️ The pieces

Piece	What it is	State
Executor	Runs a block of code in an isolated sandbox. On Workers: `DynamicWorkerExecutor`. In browser: `IframeSandboxExecutor`.	Stateless
Connectors	Classes bridging external services (MCP, OpenAPI, custom) into the sandbox as typed globals. Own their credentials and connection lifecycle.	Own state
Runtime	The handle you hold: `runtime.tool()` for the model, `pending/approve/reject/rollback` for your app, and a durable log.	Durable
Worker Loader API	New Cloudflare API that loads Worker code on-demand, in milliseconds, with per-isolate bindings. No containers, no prewarming.	Stateless

🔒 Security model

✓

No network access. The sandbox can't make HTTP requests. fetch() throws. Every external call goes through a connector.

✓

No API key exposure. Bindings provide pre-authenticated interfaces. The AI never sees raw credentials.

✓

Fresh isolate per run. Millisecond startup, zero reuse. No state leaks between executions.

✓

Approval gates. Destructive operations (create issue, merge PR) require human approval. The sandbox pauses mid-execution.

💡 What this changes

Code Mode challenges a core assumption of the MCP ecosystem: that tools must be exposed directly to the LLM. It argues that the right abstraction for LLM tool use is code, not remote procedure calls. This shifts MCP server design from "make simple tools LLMs can handle" to "expose full APIs that LLMs can reason about in code."

It also changes the economics. A 2,500-endpoint API that would consume your entire context window as native MCP tools now costs ~0.5% of that window. You can connect more services, to more capable agents, without hitting token limits.