Akshay analyzes Claude Code’s 6-layer architecture: the model is just one node in the loop

ChainNewsAbmedia

2026-05-11 14:25:46

AI engineer Akshay Pachaar shared a complete Claude Code architecture diagram on X on May 10, breaking the entire system into 6 layers and emphasizing that “the model is only one node in the loop.” Pachaar’s post cites his long-form essay from April 6, 《The Anatomy of an Agent Harness》. The key point is that Claude Code feels “magical” not because of the model itself, but because of the careful design of the harness engineering.

6-layer architecture: the model is just one node

Pachaar’s organized Claude Code 6 layers:

Input Layer (input layer): responsible for session management, permission control, and using YAML to configure trust levels. Any instruction entering the model first goes through this layer.

Knowledge Layer (knowledge layer): includes a skill registry, context compressor (3-layer compression, 92% threshold trigger), task graph, and cross-session memory storage. This is where the harness “intelligence” lives, independent of model weights.

Execution Layer (execution layer): dispatches tool calls through a typed registry, with each tool having a handler—bash, read, write, grep, glob, revert. The streaming runtime supports parallel execution, and prompt cache reuse keeps stable prefixes, reducing costs to 10%.

Integration Layer (integration layer): the MCP runtime connects to external servers (filesystem, git, and custom tools). Tools register inward, while memory writes outward to agent_memory.md.

Multi-Agent Layer (multi-agent layer): includes subagent spawner, teammate mailboxes communicating via redis pub/sub, a finite-state machine protocol (IDLE→REQUEST→WAIT→RESPOND), an autonomous board with an atomic lock, and worktree isolation (each task runs on its own independent git branch).

Observability Layer (observability layer): wraps an event message bus and lifecycle hooks across all layers, with a background executor running non-blockingly as a daemon thread.

At the center is the “master agent loop”: perceive → act → observe. Anthropic itself labels this loop as a “dumb loop”—all intelligence is in model inference, while the harness only handles orchestration.

Key designs: context compressor and worktree isolation

A few design details worth paying attention to:

Context compressor 3-layer compression, 92% threshold: when the context approaches 92% capacity, it triggers summarization and compression, keeps architecture decisions and unresolved bugs, and discards duplicate tool outputs. This echoes Anthropic’s published “context engineering guidance”: find the smallest high-signal token set and maximize the probability of achieving the goal.

Worktree isolation: each subagent works on an independent git worktree and independent branch, and performs conflict detection when merging. This design makes it possible for multiple agents to modify the same codebase in parallel without stepping on each other. Among Claude Code’s three sub-agent execution modes—“Fork / Teammate / Worktree”—Worktree is the strongest isolation level.

Prompt cache, 10% cost: by caching stable prefixes (system prompt, tool definitions, CLAUDE.md), repeated calls with the same prefix only pay 10% of the standard token cost. This is the key to keeping long-session tasks cost-controlled.

Why this breakdown resonated in the community

Pachaar’s post received 522 likes and 115 reposts. In the comments, feedback appeared like “I thought it was just a CLI tool,” and “thought Claude Code equals model + terminal access, didn’t know the multi-agent layer had so much running.” This reflects that many developers still understand Claude Code as “a Claude API wrapped in a CLI,” underestimating the complexity of harness engineering.

Pachaar cites a line from LangChain’s Vivek Trivedy as the core argument: “If you’re not the model, you’re the harness.” LangChain’s tests on TerminalBench 2.0 prove it—same model weights, only changing the surrounding harness, and the ranking jumped from outside the top 30 to 5th.

For abmedia readers, this breakdown offers a concrete reference point: when you see differences among agent products like Claude Code, Codex, and Gemini Code Assist, most of those differences are not in the model itself, but in the harness design—context management strategies, tool scope, verification loops, and multi-agent collaboration patterns. When a model version upgrades, the choices in harness engineering determine how good the product experience becomes.

This article, “Akshay’s breakdown of Claude Code’s 6-layer architecture: the model is just one node in the loop,” first appeared on ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.