MCP in Production: Solving Transport, Auth, and Scaling Challenges
Engineering Team
From Prototype to Production: What Changes
Building an MCP server that works on your laptop is straightforward. Running one that handles thousands of concurrent AI agent sessions across a distributed infrastructure is a different engineering challenge entirely. Production MCP deployments must address five concerns that prototypes can ignore: transport scalability, authentication and authorization, session management at scale, audit trails, and multi-server orchestration.
This article is a technical guide for engineering teams moving MCP servers from development to production. We assume you have built at least one MCP server and understand the basics of the protocol. If not, start with our companion article on building your first MCP server.
Transport Scalability: stdio vs SSE vs Streamable HTTP
MCP defines three transport mechanisms. Choosing the right one for production is your first architectural decision.
stdio Transport
The stdio transport communicates via standard input/output streams. The host application spawns the MCP server as a child process and exchanges JSON-RPC messages through stdin/stdout.
Advantages:
- Zero network configuration
- Process-level isolation
- No port conflicts
- Lowest latency (no network stack)
Limitations:
- Server must run on the same machine as the host
- One server process per client session
- Cannot be load-balanced
- No horizontal scaling
Best for: Local development tools, IDE extensions, single-user desktop applications.
SSE (Server-Sent Events) Transport
SSE transport uses HTTP for client-to-server messages and Server-Sent Events for server-to-client messages. The server runs as an HTTP service.
Advantages:
- Network-accessible (remote servers)
- Compatible with existing HTTP infrastructure
- Supports multiple concurrent clients
- Works through firewalls and proxies
Limitations:
- Unidirectional streaming (server-to-client only via SSE)
- Session affinity required (stateful connections)
- Some load balancers struggle with long-lived SSE connections
- No built-in reconnection semantics in the protocol
Best for: Small to medium deployments, internal tools, teams with existing HTTP infrastructure.
Streamable HTTP Transport
Streamable HTTP is the newest transport, designed specifically for production deployments. It uses standard HTTP POST for all messages, with optional SSE streaming for long-running operations.
Advantages:
- Fully stateless request/response model
- Works with any HTTP load balancer
- Built-in session management via
Mcp-Session-Idheader - Supports both streaming and non-streaming responses
- CDN and proxy compatible
Limitations:
- Requires server-side session storage (Redis, database)
- Slightly higher per-message overhead than stdio
- Newer transport — less ecosystem tooling
Best for: Production cloud deployments, multi-tenant platforms, enterprise environments.
Transport Comparison Matrix
| Feature | stdio | SSE | Streamable HTTP |
|---|---|---|---|
| Network access | Local only | Remote | Remote |
| Load balancing | Not possible | Session-sticky | Standard HTTP LB |
| Horizontal scaling | No | Limited | Yes |
| Firewall-friendly | N/A | Yes | Yes |
| Reconnection | N/A | Manual | Built-in |
| Concurrent clients | 1 | Many | Many |
| Latency | Lowest | Low | Low |
| Statelessness | Stateful | Stateful | Stateless possible |
| Production readiness | Dev only | Medium | High |
Authentication: SSO Integration, API Keys, and OAuth
MCP servers in production must authenticate both the AI host application and the end user on whose behalf the AI is acting.
OAuth 2.0 for Remote Servers
The MCP specification includes a built-in OAuth 2.0 flow for remote (HTTP-based) servers. The flow works as follows:
- The MCP client sends an
initializerequest without credentials. - The server responds with HTTP 401 and a
WWW-Authenticateheader pointing to its OAuth authorization endpoint. - The host application opens a browser for user authentication.
- After successful auth, the host receives an access token.
- Subsequent MCP requests include the token in the
Authorization: Bearerheader.
// Server-side OAuth middleware for Axum/Express
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import express from "express";
const app = express();
app.use("/mcp", async (req, res, next) => {
const token = req.headers.authorization?.replace("Bearer ", "");
if (!token) {
res.status(401).json({
error: "unauthorized",
oauth_url: "https://auth.example.com/oauth/authorize",
});
return;
}
// Validate token against your auth provider
const user = await validateToken(token);
if (!user) {
res.status(403).json({ error: "invalid_token" });
return;
}
// Attach user context for tool authorization
req.mcpUser = user;
next();
});
API Key Authentication
For service-to-service MCP communication (where no human user is involved), API keys are simpler:
const API_KEYS = new Map([
["sk-prod-abc123", { name: "analytics-service", scopes: ["read"] }],
["sk-prod-def456", { name: "admin-service", scopes: ["read", "write"] }],
]);
function authenticateApiKey(key: string) {
return API_KEYS.get(key) || null;
}
SSO Integration Pattern
For enterprise deployments, integrate with existing SSO (SAML, OIDC):
User -> AI Host -> MCP Client -> MCP Server -> SSO Provider
|
v
Validate OIDC token
Extract user roles
Apply tool-level RBAC
Each MCP tool can check the authenticated user’s roles before executing:
server.tool(
"delete_record",
"Delete a database record by ID",
{ table: z.string(), id: z.string() },
async ({ table, id }, { authContext }) => {
if (!authContext.roles.includes("admin")) {
return {
content: [{ type: "text", text: "Forbidden: admin role required" }],
isError: true,
};
}
// Proceed with deletion
}
);
Scaling: Stateful Sessions vs Load Balancers
MCP sessions are inherently stateful: the initialize handshake negotiates capabilities, and the server may maintain context across tool calls within a session. This creates a tension with horizontal scaling.
Session Store Architecture
Extract session state from the server process into an external store:
interface McpSession {
id: string;
userId: string;
capabilities: ServerCapabilities;
createdAt: Date;
lastActivityAt: Date;
metadata: Record<string, unknown>;
}
class RedisSessionStore {
constructor(private redis: Redis) {}
async create(session: McpSession): Promise<void> {
await this.redis.set(
`mcp:session:${session.id}`,
JSON.stringify(session),
"EX",
3600 // 1 hour TTL
);
}
async get(sessionId: string): Promise<McpSession | null> {
const data = await this.redis.get(`mcp:session:${sessionId}`);
return data ? JSON.parse(data) : null;
}
async touch(sessionId: string): Promise<void> {
await this.redis.expire(`mcp:session:${sessionId}`, 3600);
}
}
Horizontal Scaling with Streamable HTTP
With externalized sessions and Streamable HTTP transport, you can run multiple MCP server instances behind a standard load balancer:
┌──────────────┐
│ Load │
MCP Client ──>│ Balancer │
│ (nginx/ALB) │
└──────┬───────┘
│
┌────────────┼────────────┐
v v v
┌──────────┐ ┌──────────┐ ┌──────────┐
│ MCP │ │ MCP │ │ MCP │
│ Server 1 │ │ Server 2 │ │ Server 3 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
v v v
┌────────────────────────────────────┐
│ Redis (sessions) │
└────────────────────────────────────┘
No session-sticky routing is required. Any server instance can handle any request by loading the session from Redis.
Auto-Scaling Configuration
Example Kubernetes HPA for MCP servers:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mcp-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mcp-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: mcp_active_sessions
target:
type: AverageValue
averageValue: "100"
Audit Trails and Logging
Every MCP tool invocation in production must be logged for security, compliance, and debugging.
Structured Logging Schema
interface McpAuditLog {
timestamp: string;
sessionId: string;
userId: string;
toolName: string;
toolInput: Record<string, unknown>;
toolOutput: string;
durationMs: number;
success: boolean;
errorMessage?: string;
ipAddress: string;
userAgent: string;
}
// Middleware that wraps every tool call
function auditMiddleware(server: McpServer, logger: Logger) {
const originalTool = server.tool.bind(server);
server.tool = (name, description, schema, handler) => {
originalTool(name, description, schema, async (args, context) => {
const start = Date.now();
try {
const result = await handler(args, context);
logger.info({
event: "mcp_tool_call",
tool: name,
input: args,
durationMs: Date.now() - start,
success: !result.isError,
sessionId: context.sessionId,
userId: context.authContext?.userId,
});
return result;
} catch (error) {
logger.error({
event: "mcp_tool_error",
tool: name,
input: args,
error: (error as Error).message,
durationMs: Date.now() - start,
sessionId: context.sessionId,
});
throw error;
}
});
};
}
Compliance Considerations
| Requirement | Implementation |
|---|---|
| GDPR data access | Log which user data the AI accessed via MCP tools |
| SOC 2 audit trail | Immutable logs of all tool invocations with timestamps |
| PII protection | Redact sensitive fields in tool inputs before logging |
| Data retention | Set log TTLs matching your compliance requirements |
| Access review | Log authentication events and permission checks |
Gateway Patterns for Multi-Server Deployments
Enterprise environments often need dozens of MCP servers — one for each internal system. An MCP Gateway centralizes management:
┌──────────────┐
│ MCP Gateway │
MCP Client ──────>│ │
│ - Auth │
│ - Routing │
│ - Rate limit│
│ - Audit │
└──────┬───────┘
│
┌──────────────────┼──────────────────┐
v v v
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Jira MCP │ │ Slack MCP│ │ DB MCP │
│ Server │ │ Server │ │ Server │
└──────────┘ └──────────┘ └──────────┘
The gateway provides:
- Unified authentication: One auth flow for all backend servers.
- Tool namespace routing: Tools are prefixed with server names (
jira.create_issue,slack.send_message). - Centralized rate limiting: Per-user, per-tool, and per-server rate limits.
- Request routing: Route tool calls to the appropriate backend server.
- Circuit breaking: If a backend server is unhealthy, the gateway returns graceful errors instead of hanging.
Implementing a Basic Gateway
class McpGateway {
private backends = new Map<string, McpClient>();
async registerBackend(prefix: string, url: string) {
const client = new McpClient({ name: `gateway-${prefix}` });
const transport = new StreamableHTTPClientTransport(new URL(url));
await client.connect(transport);
this.backends.set(prefix, client);
}
async routeToolCall(
toolName: string,
args: Record<string, unknown>
) {
// toolName format: "prefix.actualTool"
const [prefix, ...rest] = toolName.split(".");
const actualTool = rest.join(".");
const backend = this.backends.get(prefix);
if (!backend) {
throw new Error(`Unknown backend: ${prefix}`);
}
return backend.callTool({ name: actualTool, arguments: args });
}
}
FAQ
Q: Should I use stdio or HTTP transport in production? A: Use Streamable HTTP for any deployment that needs to scale beyond a single machine. Use stdio only for local desktop integrations where the AI host and MCP server run on the same machine.
Q: How many concurrent sessions can one MCP server handle? A: With Streamable HTTP and externalized sessions, a single Node.js server instance typically handles 500-1,000 concurrent sessions. With Rust or Go servers, 5,000-10,000 is achievable. Scale horizontally beyond that.
Q: How do I handle MCP server downtime?
A: Implement health checks (/health endpoint), use Kubernetes liveness/readiness probes, and configure your gateway with circuit breakers. The MCP client will receive connection errors that the AI host can present as “tool temporarily unavailable.”
Q: Can MCP servers call other MCP servers? A: Yes. An MCP server can also be an MCP client, creating a chain. This is the foundation of agent-to-agent communication. Use this pattern carefully — deep chains increase latency and failure probability.
Q: How do I version my MCP server API?
A: Use the server version field in the initialize response. For breaking changes (removing tools, changing schemas), bump the major version and support both old and new versions during a migration period. The MCP protocol version itself is negotiated during initialization.
Q: What is the maximum message size for MCP? A: The protocol does not define a maximum, but practical limits depend on the transport. For Streamable HTTP, keep responses under 10 MB. For stdio, system pipe buffers (typically 64 KB) may require chunking for large responses.