Cloudflare MCP TypeScript 65-99% token savings

Code Mode

Instead of giving LLMs dozens of MCP tools they fumble with, give them one toolexecute_code — and let them write TypeScript. The results are dramatic: fewer tokens, fewer round-trips, fewer mistakes.

Originated at Cloudflare by Kenton Varda & Sunil Pai · Blog post (Sep 2025) · Docs

🔴 The problem with native MCP tool calling

Traditional MCP exposes every endpoint as a tool:

  • A GitHub MCP server exposes 50+ tools (list issues, create PR, search code…)
  • Add Stripe, Slack, and your internal API — now you're at 200+ tools
  • Every tool's schema (name, description, parameters) is injected into the LLM's context window on every request
  • The LLM was fine-tuned on synthetic tool-call examples — not real-world usage

The core insight:

LLMs have seen millions of lines of real-world TypeScript in their training data. They've seen a few hundred contrived tool-call examples. Making an LLM call tools directly is like making Shakespeare write a play in a language he learned last month.

🟠 How Code Mode works

LLM
1 tool:
codemode({code})
Sandbox
(Worker isolate)
TypeScript API
(typed globals)
MCP
Servers

The sandbox has no network access. Every effect goes through typed connectors.

Step 1 — MCP schema → TypeScript API

Cloudflare's Agents SDK reads each MCP server's tool schemas and generates typed TypeScript interfaces. The LLM sees code, not tool definitions.

Step 2 — One tool, not dozens

The agent exposes exactly one tool to the LLM: codemode({code: string}). The LLM writes TypeScript, and the runtime executes it in a sandboxed V8 isolate.

Step 3 — Sandboxed execution

Code runs in a fresh Cloudflare Worker isolate (not a container). Starts in milliseconds, costs almost nothing, and is thrown away after each run. The isolate has zero internet access — it can only reach MCP servers through typed bindings.

Step 4 — Results flow back

The sandboxed code uses console.log() to return results. Only the final output enters the LLM's context — not the intermediate API responses.

🟢 The token difference is absurd

Approach Tools Token cost % of 200K window
Raw OpenAPI spec in prompt ~2,000,000 977%
Native MCP (full schemas) 2,594 1,170,523 585%
Native MCP (minimal) 2,594 244,047 122%
Code Mode 3 ~1,100 0.5%

Source: Cloudflare MCP Server README. 2,500 Cloudflare API endpoints in ~1k tokens vs 244k for native MCP.

Why this matters beyond tokens

Multi-step operations in one sandbox run

With native tool calling, filtering 100 PRs takes 100 round-trips through the LLM. With Code Mode, the LLM writes a for loop that runs in one sandbox execution.

Discovery stays in the sandbox

codemode.search("pull request") and codemode.describe() return results into running code, not the context window. The model pulls only what it needs.

Durable state & human approval

Operations like creating issues or merging PRs need an audit trail and human sign-off. The runtime handles pending/approve/reject/rollback — the model's code just pauses at an approval gate and resumes when approved.

No API keys leak

Bindings provide pre-authorized client interfaces. The sandbox never sees raw API keys. All calls go through the agent supervisor which attaches credentials server-side.

🧬 Anatomy of a Code Mode call

The LLM writes code against typed globals. This example searches Cloudflare's API docs, finds the Workers endpoint, and creates a worker — all in one script:

// Search the spec
const matches = await codemode.search("upload worker script");
const details = await codemode.describe(matches.results[0].path);

// Call the API
const result = await cloudflare.request({
  method: "PUT",
  path: "/accounts/{account_id}/workers/scripts/{script_name}",
  pathParams: { account_id: "abc123", script_name: "hello" },
  body: { /* worker code */ }
});

console.log(result);

The LLM's context only receives the final console.log output — not the raw API responses or intermediate data.

📦 Implementations

Official

@cloudflare/codemode

Cloudflare's SDK. Uses the Dynamic Worker Loader API to run code in V8 isolates. Built into the Cloudflare Agents platform.

npm install @cloudflare/codemode
Official MCP Server

Cloudflare API MCP Server

The entire Cloudflare API (2,500+ endpoints) as an MCP server using Code Mode. Connect at https://mcp.cloudflare.com/mcp. Three tools: search, execute, docs.

Community

codemode-mcp

A local implementation by jx-codes (87 pts on HN). Runs MCP-to-TypeScript conversion locally with Deno sandboxing. Since superseded by lootbox — a more polished take on the same idea.

Pattern

Local Code Mode

A gist by chenhunghan describing how to add Code Mode to any MCP server. Uses quickjs-emscripten (JS), RestrictedPython, goja (Go), or boa_engine (Rust) as sandbox runtimes. No external dependencies.

⚡ When to use Code Mode

✅ Good fits

  • APIs with many endpoints (Cloudflare, GitHub, Stripe)
  • Multi-step workflows (search → read → transform → write)
  • LLMs that already know the API's schema from training data
  • When context window budget matters
  • When you need durable approval gates

⚠️ Not ideal

  • Simple APIs with 2-3 tools
  • When the LLM doesn't know the API schema
  • When you need real-time streaming from the tool
  • When running untrusted code is not an option
  • Non-JS/TS environments without a sandbox

🚀 Quick start (Cloudflare Agents)

1. Install

npm install @cloudflare/codemode

2. Wrap your tools with codemode

import { codemode } from "@cloudflare/codemode";

const { system, tools } = codemode({
  system: "You are a helpful assistant",
  tools: { /* your MCP servers */ },
});

3. Use in your agent

const stream = streamText({
  model: openai("gpt-5"),
  system,
  tools,  // ← just one tool: codemode({code})
  messages: [{
    role: "user",
    content: "Deploy a new Worker named hello-world"
  }]
});

🏗️ The pieces

Piece What it is
Executor Runs a block of code in an isolated sandbox. On Workers: DynamicWorkerExecutor. In browser: IframeSandboxExecutor.
Connectors Classes bridging external services (MCP, OpenAPI, custom) into the sandbox as typed globals. Own their credentials and connection lifecycle.
Runtime The handle you hold: runtime.tool() for the model, pending/approve/reject/rollback for your app, and a durable log.
Worker Loader API New Cloudflare API that loads Worker code on-demand, in milliseconds, with per-isolate bindings. No containers, no prewarming.

🔒 Security model

No network access. The sandbox can't make HTTP requests. fetch() throws. Every external call goes through a connector.

No API key exposure. Bindings provide pre-authenticated interfaces. The AI never sees raw credentials.

Fresh isolate per run. Millisecond startup, zero reuse. No state leaks between executions.

Approval gates. Destructive operations (create issue, merge PR) require human approval. The sandbox pauses mid-execution.

💡 What this changes

Code Mode challenges a core assumption of the MCP ecosystem: that tools must be exposed directly to the LLM. It argues that the right abstraction for LLM tool use is code, not remote procedure calls. This shifts MCP server design from "make simple tools LLMs can handle" to "expose full APIs that LLMs can reason about in code."

It also changes the economics. A 2,500-endpoint API that would consume your entire context window as native MCP tools now costs ~0.5% of that window. You can connect more services, to more capable agents, without hitting token limits.