Inside the Claude Agent SDK: From stdin/stdout Communication to Production on AWS AgentCore
Understanding subprocess orchestration, MCP tool integration, permission callbacks, and deployment strategies for AI agents on AWS
When you want to build an application with Claude that does more than just chat (one that can read files, query databases, or access your company’s internal systems) you face a problem. The Anthropic API gives you access to Claude’s intelligence, but it doesn’t handle the messy details of letting Claude actually do things in your systems.
This is where the Claude Agent SDK comes in. But to understand what makes it distinctive, we need to look at how it actually works under the hood.
What Problem Does the SDK Solve?
Before diving into implementation, let’s understand the infrastructure challenge. When you ask Claude to “analyze my overdue invoices,” several things must happen:
Understand that “overdue invoices” means unpaid bills past their due date
Figure out where your invoice data lives (a spreadsheet, a database, an API)
Request access to that data
Read the data
Process it to find which invoices are overdue
Calculate any relevant metrics
Explain the findings to you in plain language
Each step requires infrastructure. Someone has to write code that connects to your spreadsheet, validates that Claude should be allowed to read it, handles errors if the spreadsheet is unavailable, tracks how much the API call costs, and logs everything for security purposes.
You could build all of this yourself using the Anthropic API directly. Or you could use the Claude Agent SDK, which provides this infrastructure ready-made.
The SDK Architecture: Python + Subprocess
The Agent SDK is Claude Code as a library. The official documentation states: “Claude Agent SDK is Claude Code, but as a library that you can embed into your applications.” This means when you install the SDK, you get both Python code and a bundled Claude Code CLI executable.
When you call the SDK’s query() function, here’s what actually happens:
The SDK spawns the Claude Code CLI as a subprocess. Your Python program and the CLI run as separate processes. A process is like an independent workspace where a program does its job, with its own memory and execution thread. When the SDK spawns the CLI, it creates this separate process running alongside your Python program.
The two processes communicate through stdin (standard input) and stdout (standard output). Think of stdin as a program’s mailbox for incoming messages, and stdout as its outgoing mailbox. The SDK writes JSON messages to the CLI’s stdin, and reads JSON responses from the CLI’s stdout. This is the same mechanism you use when you pipe commands in your terminal: echo "hello" | grep h sends “hello” through stdout of one program into stdin of another.
This architecture creates a clear separation of responsibilities. The Python SDK handles control logic, permission callbacks, and hooks. The CLI subprocess handles tool execution, API calls to Claude, and external MCP server management (inferred from the control protocol messages the SDK receives). Your Python code never directly calls the Anthropic API; it delegates that to the CLI.
The Control Protocol: How SDK and CLI Communicate
The following sections describe how the SDK behaves based on inspecting its source code (version 0.1.19). These implementation details live in internal modules and may evolve across versions. The SDK uses internal transport abstractions to stream JSON messages over stdio and multiplex control requests via request IDs.
The SDK and CLI currently communicate using a JSON-lines stream. Each line contains a complete JSON object representing a message. You can observe two message categories: regular messages (agent responses, tool outputs, cost tracking) and control messages (permission requests, hook callbacks).
When you call query("What files are in /home?"), the SDK invokes the bundled CLI with flags enabling streaming structured output. For one-shot queries (string mode), an example invocation from the SDK’s implementation:
claude code --output-format stream-json --verbose --print -- "What files are in /home?"
The --output-format stream-json flag tells the CLI to emit JSON responses on stdout. The --verbose flag provides detailed output. The --print flag means print conversation output. The -- separator marks the end of flags, and everything after is the initial prompt.
For interactive sessions (streaming mode), the SDK uses --input-format stream-json instead of --print, keeping stdin open for multiple messages.
The SDK then writes this to the CLI’s stdin:
{"type": "user", "message": {"role": "user", "content": "What files are in /home?"}}
The CLI processes the request, and if Claude decides it needs to use the Bash tool to list files, something interesting happens. Instead of just executing the tool, the CLI sends a control request back to the SDK. The following shows the observed wire format (field names and structure may vary):
{
"type": "control_request",
"request_id": "req_1_abc123",
"request": {
"subtype": "can_use_tool",
"tool_name": "Bash",
"input": {"command": "ls /home"}
}
}
The SDK receives this control request and invokes your Python callback function. Your callback decides whether to allow, deny, or modify the tool call. The SDK then sends a control response:
{
"type": "control_response",
"request_id": "req_1_abc123",
"response": {
"subtype": "success",
"response": {"behavior": "allow"}
}
}
The request_id field enables multiplexing. Multiple control requests can be in flight simultaneously, and the SDK matches responses to requests using these IDs.
How Tools Work: The @tool Decorator
Tools are functions that Claude can request to execute. The SDK provides two ways to add tools: external MCP servers (separate processes) and in-process tools (Python functions in your application).
MCP (Model Context Protocol) is a standard way to package tools. Without a standard, every tool developer would create their own format. You’d need different code to connect Claude to Google Sheets than to Salesforce than to your database. MCP solves this by defining a common interface. An MCP server exposes tools through this interface, declaring “here are my tools and what they do.”
For in-process tools, you use the @tool decorator:
from claude_agent_sdk import tool
@tool
def calculate_discount(price: float, percent: float) -> float:
"""Calculate discounted price"""
return price * (1 - percent / 100)
What does this decorator actually do? It doesn’t just mark the function; it registers it in a data structure that the SDK uses to create an MCP server. When you call create_sdk_mcp_server(), the SDK bundles all your @tool decorated functions into an MCP server definition.
The SDK then passes this MCP server configuration to the CLI during initialization. The CLI loads it as an “SDK MCP server” and makes these tools available to Claude. When Claude requests one of these tools, the CLI sends a control request back to the SDK, which directly calls your Python function and returns the result.
This is why in-process tools are faster than external MCP servers. External servers require spawning a subprocess, communicating over stdio, and serializing data. In-process tools are just Python function calls with a control protocol round trip, avoiding subprocess overhead entirely.
Permission Callbacks: Runtime Security
Imagine Claude asks to run the command “delete all files.” You probably don’t want that to happen automatically. This is where permission systems come in.
When you provide a can_use_tool callback, the SDK registers it internally and tells the CLI during initialization that it should request permission before executing tools. A callback is a function you write that the SDK promises to call at the right moment.
The callback signature looks like:
def can_use_tool(
tool_name: str,
tool_input: dict,
context: ToolPermissionContext
) -> PermissionResult:
if tool_name == "Bash" and "/etc" in tool_input.get("command", ""):
return PermissionResultDeny(reason="Cannot access /etc")
return PermissionResultAllow()
Before the CLI executes a tool, it sends a control request to the SDK asking “should I allow this?” The SDK invokes your callback with the tool name, the input Claude wants to pass, and context about the current conversation. Your callback returns allow, deny, or allow with modified input.
The context parameter provides permission suggestions from the CLI and can be used to track state across callbacks. This lets you implement sophisticated policies like “allow 5 API calls per conversation” or “require human approval for database writes” by maintaining state in your callback logic.
You can also return PermissionResultAllow(updated_input={...}) to modify the tool input. This is powerful for sanitization or adding parameters. For example, you could automatically add rate limiting headers to API calls or restrict file paths to safe directories.
Hooks: Observability and Interception
Hooks are similar to callbacks but serve different purposes. The SDK supports several hook types:
PreToolUse: Called before a tool executesPostToolUse: Called after a tool finishesUserPromptSubmit: Called when the user submits a promptStop: Called when the agent stopsSubagentStop: Called when a subagent stopsPreCompact: Called before context compaction
When you register a hook, the SDK tells the CLI which events you want to observe. The CLI then sends hook callback requests via the control protocol.
Here’s what happens when a PreToolUse hook fires:
CLI is about to execute a tool
CLI sends control request with
subtype: "hook_callback"containing acallback_idand hook-specific input dataSDK invokes your Python hook function registered for that callback
Your hook can log, modify context, or even cancel the operation
SDK sends control response with any modifications
CLI proceeds with tool execution
Hooks enable audit logging, metrics collection, and dynamic behavior modification without changing your agent’s core logic.
The Query Class: Implementation Details
The SDK’s query() function is a thin wrapper around internal classes that orchestrate the subprocess lifecycle. In the current implementation, the Query class (in the non-public _internal/query.py module) handles this orchestration.
When you call query(), the Query class:
Resolves the CLI path (bundled executable or custom path)
Constructs the command with appropriate flags
Spawns the subprocess using
anyio.open_process()Sets up stdin/stdout streams with JSON line parsing
Sends the initial prompt message
Enters a message reading loop that yields each response
Handles control requests by invoking callbacks and sending responses
Manages process cleanup when the stream ends
At the time of writing, the transport implementation (in _internal/transport/subprocess_cli.py, a non-public API) handles the low-level details:
Process lifecycle (spawn, terminate, kill)
Stream buffering (observed 1MB default buffer for JSON parsing)
Write locking (prevents concurrent writes from different async tasks)
Error handling (process exit codes, stream errors, JSON parsing failures)
Timeout management (configurable timeouts for control requests)
Two Modes: Simple Query vs Interactive Client
The SDK provides two APIs with different characteristics:
The query() function is for one-shot interactions. It spawns the CLI, sends your prompt, streams responses, and terminates the process when done. This is simple but has overhead; each query starts a new process.
The ClaudeSDKClient class is for interactive sessions. You create a client, call connect() to spawn the CLI, and then send multiple queries over the same process. The client maintains state across queries and supports interruptions, model switching, and permission mode changes.
The key difference is in how the CLI is invoked. Query mode passes the prompt as a CLI argument and closes stdin immediately. Client mode uses a streaming input format that keeps stdin open for multiple messages.
Client mode also performs a control protocol handshake. After spawning the CLI, it sends an initialize control request that registers all your hooks and declares your SDK MCP servers. The CLI responds with its capabilities and supported commands.
MCP Integration: External vs In-Process
When you configure external MCP servers, you provide a configuration that tells the CLI how to spawn them:
options = ClaudeAgentOptions(
mcp_servers={
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"]
}
}
)
The CLI spawns these as subprocesses and communicates with them using the standard MCP protocol over stdio. These servers run in separate processes with their own lifecycle.
SDK MCP servers (your @tool decorated functions) work differently. The SDK creates an MCP server definition internally but doesn’t spawn a process. Instead, when the CLI needs one of these tools, it sends a control request to the SDK, which directly invokes your Python function.
This architectural difference has performance implications. External MCP servers have subprocess overhead. SDK MCP servers avoid subprocess spawning entirely by using the control protocol to invoke Python functions directly in your application’s process.
Deploying to Production on AWS
Your Claude Agent SDK application is a Python program that spawns a subprocess. When deploying to production, you need somewhere to run this.
On AWS, you can deploy to Lambda (serverless functions), ECS (Docker containers), or EC2 (virtual machines). The SDK works on all these platforms because it’s just Python code that spawns a subprocess.
To use AWS Bedrock instead of the Anthropic API, you configure Claude Code to route API calls through Bedrock. This is a configuration change in the CLI, not the SDK. Your Python code doesn’t change; the SDK still spawns the same CLI subprocess and uses the same control protocol.
Amazon Bedrock AgentCore: Managed Infrastructure
Amazon Bedrock AgentCore is AWS’s production platform for running AI agents. It provides managed capabilities that you’d otherwise build yourself: memory persistence, identity integration, observability, and code execution sandboxes.
AgentCore is framework-agnostic. It works with agents built using the Claude Agent SDK, LangChain, or custom frameworks. You run your SDK-based Python application on AgentCore Runtime (via the AgentCore SDK/APIs), and it provides the operational infrastructure.
The memory system addresses a real production constraint. When conversations hit token limits, losing history means starting over. AgentCore provides managed memory resources so agents can persist and retrieve state across sessions. A customer service agent can remember previous interactions. An analytics agent can maintain context across multiple analysis requests.
The gateway capability helps expose APIs and services as agent tools, handling authentication and routing patterns. This makes it straightforward to integrate SDK agents into existing workflows, whether you’re triggering agents from CI/CD pipelines, issue trackers, or internal tools.
AgentCore provides built-in observability with monitoring, logging, and debugging capabilities. You can track agent behavior, measure performance, monitor resource usage, and analyze costs. You can optionally instrument your agent code for richer spans, traces, and custom metrics.
Cost Optimization Strategies
Running Claude Agent SDK applications involves several cost components. Claude charges per token, with input tokens cheaper than output tokens. Prompt caching can reduce costs significantly for stable system prompts and tool catalogs. The SDK’s tool definitions don’t change frequently, making them ideal candidates for caching.
Choose the right model for each task. Use Haiku for simple tool selection and classification. Use Sonnet for most agent tasks. Reserve Opus for complex reasoning that truly needs maximum capability.
Monitor tool execution costs. The SDK makes it easy to add hooks that track how much each conversation costs. Analyze which tools are used most frequently and whether they’re providing value proportional to their token consumption.
On AWS, you pay for compute resources (Lambda execution time, ECS/EC2 instances) plus model costs. AgentCore adds infrastructure costs for memory persistence, identity integration, and observability. Provisioned throughput (Bedrock only) trades per-token pricing for predictable monthly costs.
Understanding the SDK’s Design Philosophy
The Claude Agent SDK solves a specific architectural problem: how do you build an agent system where Anthropic maintains the runtime while you control the logic?
The subprocess architecture enables this separation. Anthropic maintains the Claude Code CLI, handling conversation management, tool orchestration, API communication, and error recovery. You maintain your Python application, defining tools, enforcing permissions, and processing data. When Claude gains new capabilities or bugs get fixed, you get the updates automatically by upgrading the SDK package.
The control protocol enables callbacks without tight coupling. Your Python code doesn’t need to understand the agent loop or API protocols. It just responds to control requests: “Should I use this tool?” “Run this hook callback.” “Here’s the result of this SDK tool.” The CLI handles everything else.
Native MCP support means tools are first-class citizens. You can use external MCP servers (separate processes communicating over stdio or HTTP) or in-process tools (Python functions directly in your application). The SDK manages MCP server discovery, connection, and tool catalogs without additional abstraction layers.
Built-in permissions and hooks distinguish the SDK from general-purpose frameworks. The canUseTool callback and declarative allow/deny rules are core features, not something you bolt on afterward. Hooks provide observability, compliance, and integration capabilities as part of the architecture.
For production deployment on AWS, you have flexibility. During development, the SDK calls the Anthropic API directly. In production, you configure Claude Code to route API calls through AWS Bedrock, getting compliance certifications, regional data residency, and centralized AWS billing. Your Python code doesn’t change. You deploy to standard AWS compute or to AgentCore for agent-specific runtime infrastructure.
The SDK’s subprocess architecture, native MCP support, built-in permissions, maintained runtime, and deployment flexibility solve real problems that you’d otherwise build yourself. Start with the SDK for development, leverage its distinctive features, deploy to AWS infrastructure as you scale, and let Anthropic handle the agent orchestration layer while you focus on what makes your agent unique.



